Message ID | 20161214150840.10899-1-alex@alex.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
14.12.2016 18:08, Alex Bligh wrote: > (NB: I've already applied this and pushed it) > > * Change NBD_OPT_LIST_METADATA etc. to explicitly send a list of queries > and add a count of queries so we can extend the command later (rather than > rely on the length of option) > > * For NBD_OPT_LIST_METADATA make absence of any query (as opposed to zero > length query) list all contexts, as absence of any query is now simple. > > * Move definition of namespaces in the document to somewhere more appopriate. > > * Various other minor changes as discussed on the mailing list > > Signed-off-by: Alex Bligh <alex@alex.org.uk> > --- > doc/proto.md | 179 +++++++++++++++++++++++++++++++++++++---------------------- > 1 file changed, 112 insertions(+), 67 deletions(-) > > diff --git a/doc/proto.md b/doc/proto.md > index e03f434..6955c76 100644 > --- a/doc/proto.md > +++ b/doc/proto.md > @@ -683,15 +683,20 @@ by other users, so has been moved out of the "experimental" section. > > ## Metadata querying > > -With the availability of sparse storage formats, it is often needed to > -query the status of a particular range and read only those blocks of > -data that are actually present on the block device. > +It is often helpful for the client to be able to query the status > +of a range of blocks. The nature of the status that can be > +queried is in part implementation dependent. For instance, > +the status might represent: > > -Some storage formats and operations over such formats express a > -concept of data dirtiness. Whether the operation is block device > -mirroring, incremental block device backup or any other operation with > -a concept of data dirtiness, they all share a need to provide a list > -of ranges that this particular operation treats as dirty. > +* in a sparse storage format, whether the relevant blocks are > + actually present on the backing device for the export; or > + > +* whether the relevant blocks are 'dirty'; some storage formats > + and operations over such formats express a concept of data dirtiness. > + Whether the operation is block device mirroring, incremental block > + device backup or any other operation with a concept of data dirtiness, > + they all share a need to provide a list of ranges that this > + particular operation treats as dirty. > > To provide such classes of information, the NBD protocol has a generic > framework for querying metadata; however, its use must first be > @@ -699,9 +704,11 @@ negotiated, and one or more metadata contexts must be selected. > > The procedure works as follows: > > -- First, during negotiation, the client MUST select one or more metadata > +- First, during negotiation, if the client wishes to query metadata > + during transmission, the client MUST select one or more metadata > contexts with the `NBD_OPT_SET_META_CONTEXT` command. If needed, the client > - can use `NBD_OPT_LIST_META_CONTEXT` to list contexts. > + can use `NBD_OPT_LIST_META_CONTEXT` to list contexts that the server > + supports. > - During transmission, a client can then indicate interest in metadata > for a given region by way of the `NBD_CMD_BLOCK_STATUS` command, where > *offset* and *length* indicate the area of interest. The server MUST > @@ -715,13 +722,37 @@ The procedure works as follows: > of type `NBD_REPLY_TYPE_BLOCK_STATUS`. > > A client MUST NOT use `NBD_CMD_BLOCK_STATUS` unless it selected a > -nonzero number of metadata contexts during negotiation. Servers SHOULD > -reply to clients doing so anyway with `EINVAL`. > +non-zero number of metadata contexts during negotiation. Servers SHOULD > +reply to clients sending `NBD_CMD_BLOCK_STATUS without backquote > +selecting metadata contexts with `EINVAL`. > > -The reply to the `NBD_CMD_BLOCK_STATUS` request MUST be sent by a > +The reply to the `NBD_CMD_BLOCK_STATUS` request MUST be sent as a > structured reply; this implies that in order to use metadata querying, > structured replies MUST be negotiated first. > > +Metadata contexts are identified by their names. The name MUST > +consist of a namespace, followed by a colon, followed by a leaf-name. > +The namespace and the leaf-name must each consist entirely of > +printable non-whitespace UTF-8 characters other than colons, > +and be non-empty. The entire name (namespace, colon and leaf-name) > +MUST NOT exceed 255 bytes (and therefore botht he namespace and > +leaf-name are guaranteed to be smaller than 255 bytes). > + > +Namespaces MUST be consist of one of the following: > +- `base`, for metadata contexts defined by this document; > +- `nbd-server`, for metadata contexts defined by the > + implementation that accompanies this document (none > + currently); > +- `x-*`, where `*` can be replaced by any random string not > + containing colons, for local experiments. This SHOULD NOT be > + used by metadata contexts that are expected to be widely used. > +- A third-party namespace from the list below. > + > +Third-party implementations can register additional namespaces by simple > +request to the mailing-list. The following additional third-party namespaces > +are currently registered: > +* (none) > + > This standard defines exactly one metadata context; it is called > `base:allocation`, and it provides information on the basic allocation > status of extents (that is, whether they are allocated at all in a > @@ -918,9 +949,7 @@ of the newstyle negotiation. > - `NBD_OPT_LIST_META_CONTEXT` (10) > > Return a list of `NBD_REP_META_CONTEXT` replies, one per context, > - followed by an `NBD_REP_ACK`. If a server replies to such a request > - with no error message, clients MAY send NBD_CMD_BLOCK_STATUS > - commands during the transmission phase. > + followed by an `NBD_REP_ACK`. > > If the query string is syntactically invalid, the server SHOULD send > `NBD_REP_ERR_INVALID`. If the query string is syntactically valid > @@ -932,51 +961,58 @@ of the newstyle negotiation. > SHOULD send `NBD_REP_ERR_INVALID`. > > Data: > - - 32 bits, length of export name > - - String, name of export for which we wish to list or select metadata > + - 32 bits, length of export name. > + - String, name of export for which we wish to list metadata > contexts. > - - 32 bits, length of query > - - String, query to select a subset of the available metadata > - contexts. If this is not specified (i.e., the "length of query" > - field is 0 and no query is sent), then the server MUST send all > - the metadata contexts it knows about. If specified, this query > - string MUST start with a name that uniquely identifies a server > - implementation; e.g., the reference implementation that > - accompanies this document would support query strings starting > - with 'nbd-server:' > - > - The server MUST reply with a list of `NBD_REP_META_CONTEXT` replies, > - followed by `NBD_REP_ACK`. The metadata context ID in these replies > - is reserved and SHOULD be set to zero; clients SHOULD disregard it. > + - 32 bits, number of queries > + - Zero or more queries, each being: > + - 32 bits, length of query. > + - String, query to list a subset of the available metadata > + contexts. > + > + If zero queries are sent, then the server MUST return all > + the metadata contexts it knows about. I think that 'all .. it knows about' is too much. What about 'return all available ..'? Anyway 'all ... it knows about' actually equals to 'all ... it wants'. There may be some private, or unrelated contexts, for example.. > + > + For details on the query string, see under `NBD_OPT_SET_META_CONTEXT`. > + > + The server MUST either reply with an error (for instance `EINVAL` > + if the option is not supported), or reply with a list of > + `NBD_REP_META_CONTEXT` replies followed by `NBD_REP_ACK`. > + The metadata context ID in these replies is reserved and SHOULD be > + set to zero; clients MUST disregard it. > > - `NBD_OPT_SET_META_CONTEXT` (11) > > Change the set of active metadata contexts. Issuing this command > replaces all previously-set metadata contexts; clients must ensure > - that all metadata contexts they're interested in are selected with > + that all metadata contexts they are interested in are selected with > the final query that they sent. > > + A client MUST NOT send `NBD_CMD_BLOCK_STATUS` unless > + within the negotiation phase it sent `NBD_OPT_SET_META_CONTEXT` > + at least once, and the final time it was sent, the server > + responded without an error, and returned at least one metadata > + context. > + > Data: > - - 32 bits, length of query > - - String, query to select metadata contexts. The syntax of this > - query is implementation-defined, except that it MUST start with a > - namespace. This namespace may be one of the following: > - - `base:`, for metadata contexts defined by this document; > - - `nbd-server:`, for metadata contexts defined by the > - implementation that accompanies this document (none > - currently); > - - `x-*:`, where `*` can be replaced by any random string not > - containing colons, for local experiments. This SHOULD NOT be > - used by metadata contexts that are expected to be widely used. > - - third-party implementations can register additional > - namespaces by simple request to the mailinglist. > - > - These two fields MAY be repeated as much as is necessary to select all > - metadata contexts the client is interested in. > + - 32 bits, length of export name. > + - String, name of export for which we wish to list metadata > + contexts. > + - 32 bits, number of queries > + - Zero or more queries, each being: > + - 32 bits, length of query > + - String, query to select metadata contexts. The syntax of this > + query is implementation-defined, except that it MUST start with a > + namespace and a colon. > + > + If zero queries are sent, the server MUST select no metadata contexts. > > The server MUST reply with a number of `NBD_REP_META_CONTEXT` > replies, one for each selected metadata context, each with a unique > - metadata context ID. It is not an error if a > + metadata context ID, followed by `NBD_REP_ACK`. The metadata context > + ID is transient and may vary across calls to `NBD_OPT_SET_META_CONTEXT`; > + clients MUST therefore treat the ID as an opaque value and not (for > + instance) cache it between connections. It is not an error if a > `NBD_OPT_SET_META_CONTEXT` option does not select any metadata > context, provided the client then does not attempt to issue > `NBD_CMD_BLOCK_STATUS` commands. > @@ -1071,9 +1107,10 @@ The `base:allocation` metadata context is the basic "allocated at all" > metadata context. If an extent is marked with `NBD_STATE_HOLE` at that > context, this means that the given extent is not allocated in the > backend storage, and that writing to the extent MAY result in the ENOSPC > -error. This supports sparse file semantics on the server side. If a > -server has only one metadata context (the default), then writing to an > -extent which has `NBD_STATE_HOLE` clear MUST NOT fail with ENOSPC. > +error. This supports sparse file semantics on the server side. > +If a server supports the `base:allocation` metadata context, then writing > +to an extent which has `NBD_STATE_HOLE` clear MUST NOT fail with ENOSPC > +unless for reasons specified in the definition of another context. > > It defines the following flags for the flags field: > > @@ -1082,20 +1119,21 @@ It defines the following flags for the flags field: > `ENOSPC` error); if clear, the block is allocated or the server could > not otherwise determine its status. Note that the use of > `NBD_CMD_TRIM` is related to this status, but that the server MAY > - report a hole even where trim has not been requested, and also that a > - server MAY report metadata even where a trim has been requested. > + report a hole even where `NBD_CMD_TRIM` has not been requested, and > + also that a server MAY report that the block is allocated even where > + `NBD_CMD_TRIM` has been requested. > - `NBD_STATE_ZERO` (bit 1): if set, the block contents read as all > zeroes; if clear, the block contents are not known. Note that the use > of `NBD_CMD_WRITE_ZEROES` is related to this status, but that the > - server MAY report zeroes even where write zeroes has not been > + server MAY report zeroes even where `NBD_CMD_WRITE_ZEROES` has not been > requested, and also that a server MAY report unknown content even > - where write zeroes has been requested. > + where `NBD_CMD_WRITE_ZEROES` has been requested. > > It is not an error for a server to report that a region of the > export has both `NBD_STATE_HOLE` set and `NBD_STATE_ZERO` clear. The > -contents of such an area is undefined, and may not be stable; > -clients who are aware of the existence of such a region SHOULD NOT > -read it. > +contents of such an area are undefined, and a client > +reading such an area should make no assumption as to its contents > +or stability. > > For the `base:allocation` context, the remainder of the flags field is > reserved. Servers SHOULD set it to all-zero; clients MUST ignore unknown > @@ -1150,9 +1188,10 @@ valid may depend on negotiation during the handshake phase. > `EOVERFLOW` error chunk, if the request length is too large. > - bit 3, `NBD_CMD_FLAG_REQ_ONE`; valid during `NBD_CMD_BLOCK_STATUS`. If > set, the client is interested in only one extent per metadata > - context. If this flag is present, the server SHOULD NOT send metadata > - on more than one extent in the reply. Clients SHOULD NOT use this flag > - on multiple requests for successive regions in the export. > + context. If this flag is present, the server MUST NOT send metadata > + on more than one extent in the reply. Client implementors should note > + that using this flag on multiple contiguous requests is likely to be > + inefficient. > > ##### Structured reply flags > > @@ -1225,7 +1264,7 @@ interpret the "length" bytes of payload. > > *length* MUST be 4 + (a positive integer multiple of 8). This reply > represents a series of consecutive block descriptors where the sum > - of the lengths of the descriptors MUST not be greater than the > + of the length fields within the descriptors MUST not be greater than the > length of the original request. This chunk type MUST appear exactly > once per metadata ID in a structured reply. > > @@ -1236,14 +1275,15 @@ interpret the "length" bytes of payload. > and is followed by a list of one or more descriptors, each with this > layout: > > - * 32 bits, length (unsigned, MUST NOT be zero) > + * 32 bits, length of the extent to which the status below > + applies (unsigned, MUST be non-zero) > * 32 bits, status flags > > If the client used the `NBD_CMD_FLAG_REQ_ONE` flag in the request, > then every reply chunk MUST NOT contain more than one descriptor. > > Even if the client did not use the `NBD_CMD_FLAG_REQ_ONE` flag in > - its request, the server MAY return less descriptors in the reply > + its request, the server MAY return fewer descriptors in the reply > than would be required to fully specify the whole range of requested > information to the client, if looking up the information would be > too resource-intensive for the server, so long as at least one > @@ -1462,8 +1502,13 @@ The following request types exist: > * `NBD_CMD_BLOCK_STATUS` (7) > > A block status query request. Length and offset define the range of > - interest. Clients MUST NOT use this request unless metadata > - contexts have been negotiated, which in turn requires the client to > + interest. > + > + A client MUST NOT send `NBD_CMD_BLOCK_STATUS` unless > + within the negotiation phase it sent `NBD_OPT_SET_META_CONTEXT` > + at least once, and the final time it was sent, the server > + responded without an error, and returned at least one metadata > + context. This in turn requires the client to > first negotiate structured replies. For a successful return, the > server MUST use a structured reply, containing at least one chunk of > type `NBD_REPLY_TYPE_BLOCK_STATUS`, where the status field of each
Vladimir, >> +non-zero number of metadata contexts during negotiation. Servers SHOULD >> +reply to clients sending `NBD_CMD_BLOCK_STATUS without > > backquote Fixed >> + If zero queries are sent, then the server MUST return all >> + the metadata contexts it knows about. > > I think that 'all .. it knows about' is too much. What about 'return all available ..'? Anyway 'all ... it knows about' actually equals to 'all ... it wants'. There may be some private, or unrelated contexts, for example.. This was not my wording, but I've changed it anyway to: If zero queries are sent, then the server MUST return all the metadata contexts that are available to the client to select on the given export with `NBD_OPT_SET_META_CONTEXT`. I think if they are available to select, we should list them. Thanks also for reminding me to document why I put the export name into the _LIST_ data (as it is for _SET_). However, this raises another question. Wouter deliberately made the query format freeform so that you could e.g. set a context like: backup:modtime>201612081034 which might (in theory) return a list of blocks which are newer than the given timestamp. It would clearly be impossible to return all such contexts. I wonder if we should carve out an exception here.
On 12/14/2016 09:08 AM, Alex Bligh wrote: > (NB: I've already applied this and pushed it) > > * Change NBD_OPT_LIST_METADATA etc. to explicitly send a list of queries > and add a count of queries so we can extend the command later (rather than > rely on the length of option) > > * For NBD_OPT_LIST_METADATA make absence of any query (as opposed to zero > length query) list all contexts, as absence of any query is now simple. > > * Move definition of namespaces in the document to somewhere more appopriate. > > * Various other minor changes as discussed on the mailing list > > +Metadata contexts are identified by their names. The name MUST > +consist of a namespace, followed by a colon, followed by a leaf-name. > +The namespace and the leaf-name must each consist entirely of > +printable non-whitespace UTF-8 characters other than colons, > +and be non-empty. The entire name (namespace, colon and leaf-name) > +MUST NOT exceed 255 bytes (and therefore botht he namespace and s/botht he/both the/
> On 14 Dec 2016, at 16:58, Eric Blake <eblake@redhat.com> wrote: > > s/botht he/both the/ Thanks - fixed -- Alex Bligh
14.12.2016 19:38, Alex Bligh wrote: > Vladimir, > >>> +non-zero number of metadata contexts during negotiation. Servers SHOULD >>> +reply to clients sending `NBD_CMD_BLOCK_STATUS without >> backquote > Fixed > >>> + If zero queries are sent, then the server MUST return all >>> + the metadata contexts it knows about. >> I think that 'all .. it knows about' is too much. What about 'return all available ..'? Anyway 'all ... it knows about' actually equals to 'all ... it wants'. There may be some private, or unrelated contexts, for example.. > This was not my wording, but I've changed it anyway to: > > If zero queries are sent, then the server MUST return all > the metadata contexts that are available to the client to select > on the given export with `NBD_OPT_SET_META_CONTEXT`. > > I think if they are available to select, we should list them. Thanks > also for reminding me to document why I put the export name into the > _LIST_ data (as it is for _SET_). > > However, this raises another question. Wouter deliberately made the > query format freeform so that you could e.g. set a context like: > > backup:modtime>201612081034 > > which might (in theory) return a list of blocks which are newer than > the given timestamp. It would clearly be impossible to return all such > contexts. I wonder if we should carve out an exception here. > Interesting. Even query 'backup:modtime>*' would be a problem, not only empty query list. Actually, we do not need to 'list contexts', as it is about management, not about data transfer. We only need a way to check, that particular query selects all needed contexts and no others. Just to be sure that we are know, what we will select by NBD_OPT_SET_META_CONTEXT with _same_ query. So, I suggest just to say, that _LIST_ can return error if too much contexts are selected.|And same should be done for _SET_. And it should be documented that this result of query (list or error) should be equal for these two commands. |
On Wed, Dec 14, 2016 at 03:08:40PM +0000, Alex Bligh wrote: > (NB: I've already applied this and pushed it) Thanks. > * Change NBD_OPT_LIST_METADATA etc. to explicitly send a list of queries > and add a count of queries so we can extend the command later (rather than > rely on the length of option) Sure, that works. > * For NBD_OPT_LIST_METADATA make absence of any query (as opposed to zero > length query) list all contexts, as absence of any query is now simple. > > * Move definition of namespaces in the document to somewhere more appopriate. > > * Various other minor changes as discussed on the mailing list Right. I think we're getting close to a good spec now, for this thing. One thing I've been thinking about that we might want to add: There may be cases where a server, in performing the required calls to be able to handle a BLOCK_STATUS request, will end up with more information than the client asked; e.g., if the client asked information in the base:allocation context on an extent at offset X of length Y, then the server might conceivably do an lseek(SEEK_DATA) and/or lseek(SEEK_HOLE). The result of that call might be that the file offset will now point to a location Z, where Z > (X+Y). Currently, our spec says "the sum of the *length* fields MUST not be greater than the overall *length* of request". This means that in essense, the server then has to throw away the information it has on the range Z - (X + Y). In case a client was interested in that information, that seems like a waste. I would therefore like to remove that requirement, by rephrasing that "sum of the *length* fields" thing into something along the following lines: In case the server returns N extents, the sum of the *length* fields of the first N-1 extents MUST NOT be greater than the overall *length* of the request. The final extent MAY exceed the length of the request if the server has that information anyway as a side effect of looking up the required information and wishes to share it. This would then result in the fact that the "length" field in the BLOCK_STATUS command would be little more than a hint, since we're saying that a server can return more data than requested (if it's available anyway) and less data than requested (if it would be too resource-intensive to provide all the information). Not sure whether all that makes much sense anymore, but hey. In addition, the combination of a server providing more information than requested with a "REQ_ONE" flag and a length field of zero could be an interesting way to enumerate a whole export, too. Essentially, we could define that as a client saying "I'm interested in what the size of the extent at offset X is, and what its properties are". Thoughts?
14.12.2016 20:09, Wouter Verhelst wrote: > On Wed, Dec 14, 2016 at 03:08:40PM +0000, Alex Bligh wrote: >> (NB: I've already applied this and pushed it) > Thanks. > >> * Change NBD_OPT_LIST_METADATA etc. to explicitly send a list of queries >> and add a count of queries so we can extend the command later (rather than >> rely on the length of option) > Sure, that works. > >> * For NBD_OPT_LIST_METADATA make absence of any query (as opposed to zero >> length query) list all contexts, as absence of any query is now simple. >> >> * Move definition of namespaces in the document to somewhere more appopriate. >> >> * Various other minor changes as discussed on the mailing list > Right. I think we're getting close to a good spec now, for this thing. > > One thing I've been thinking about that we might want to add: > > There may be cases where a server, in performing the required calls to > be able to handle a BLOCK_STATUS request, will end up with more > information than the client asked; e.g., if the client asked information > in the base:allocation context on an extent at offset X of length Y, > then the server might conceivably do an lseek(SEEK_DATA) and/or > lseek(SEEK_HOLE). The result of that call might be that the file offset > will now point to a location Z, where Z > (X+Y). > > Currently, our spec says "the sum of the *length* fields MUST not be > greater than the overall *length* of request". This means that in > essense, the server then has to throw away the information it has on the > range Z - (X + Y). In case a client was interested in that information, > that seems like a waste. I would therefore like to remove that > requirement, by rephrasing that "sum of the *length* fields" thing into > something along the following lines: > > In case the server returns N extents, the sum of the *length* fields > of the first N-1 extents MUST NOT be greater than the overall *length* > of the request. The final extent MAY exceed the length of the request > if the server has that information anyway as a side effect of looking > up the required information and wishes to share it. > > This would then result in the fact that the "length" field in the > BLOCK_STATUS command would be little more than a hint, since we're > saying that a server can return more data than requested (if it's > available anyway) and less data than requested (if it would be too > resource-intensive to provide all the information). Not sure whether all > that makes much sense anymore, but hey. > > In addition, the combination of a server providing more information than > requested with a "REQ_ONE" flag and a length field of zero could be an > interesting way to enumerate a whole export, too. Essentially, we could > define that as a client saying "I'm interested in what the size of the > extent at offset X is, and what its properties are". > > Thoughts? > Good, I'm for it. May be, in such wording there are too much information about implementation (again, server can do it if it _wants_, is it a side effect or not is implementation defined). In other words, "if the server..." is better read as an example, not requirement. But it's not important.
> On 14 Dec 2016, at 17:05, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: >> >> However, this raises another question. Wouter deliberately made the >> query format freeform so that you could e.g. set a context like: >> >> backup:modtime>201612081034 >> >> which might (in theory) return a list of blocks which are newer than >> the given timestamp. It would clearly be impossible to return all such >> contexts. I wonder if we should carve out an exception here. >> >> > > Interesting. Even query 'backup:modtime>*' would be a problem, not only empty query list. > > Actually, we do not need to 'list contexts', as it is about management, not about data transfer. We only need a way to check, that particular query selects all needed contexts and no others. Just to be sure that we are know, what we will select by NBD_OPT_SET_META_CONTEXT with _same_ query. > > So, I suggest just to say, that _LIST_ can return error if too much contexts are selected. And same should be done for _SET_. And it should be documented that this result of query (list or error) should be equal for these two commands. (two CCs that always bounce removed) Hmmm... Well in the '*' case, it's up to the namespace as to how it parses things, so '*' could be prohibited entirely or mean 'return the first 20 matching' or similar. So that's less of a problem. And 'set all' doesn't exist (unlike 'list all'). I think in the LIST case we should allow the server to omit contexts under certain circumstances, and allow SET to return ETOOBIG.
Wouter, > Right. I think we're getting close to a good spec now, for this thing. > > One thing I've been thinking about that we might want to add: > > There may be cases where a server, in performing the required calls to > be able to handle a BLOCK_STATUS request, will end up with more > information than the client asked; e.g., if the client asked information > in the base:allocation context on an extent at offset X of length Y, > then the server might conceivably do an lseek(SEEK_DATA) and/or > lseek(SEEK_HOLE). The result of that call might be that the file offset > will now point to a location Z, where Z > (X+Y). > > Currently, our spec says "the sum of the *length* fields MUST not be > greater than the overall *length* of request". This means that in > essense, the server then has to throw away the information it has on the > range Z - (X + Y). In case a client was interested in that information, > that seems like a waste. I would therefore like to remove that > requirement, by rephrasing that "sum of the *length* fields" thing into > something along the following lines: > > In case the server returns N extents, the sum of the *length* fields > of the first N-1 extents MUST NOT be greater than the overall *length* > of the request. The final extent MAY exceed the length of the request > if the server has that information anyway as a side effect of looking > up the required information and wishes to share it. > > This would then result in the fact that the "length" field in the > BLOCK_STATUS command would be little more than a hint, since we're > saying that a server can return more data than requested (if it's > available anyway) and less data than requested (if it would be too > resource-intensive to provide all the information). Not sure whether all > that makes much sense anymore, but hey. +1 > In addition, the combination of a server providing more information than > requested with a "REQ_ONE" flag and a length field of zero could be an > interesting way to enumerate a whole export, too. Essentially, we could > define that as a client saying "I'm interested in what the size of the > extent at offset X is, and what its properties are". Also +1
14.12.2016 20:36, Alex Bligh wrote: >> On 14 Dec 2016, at 17:05, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: >>> However, this raises another question. Wouter deliberately made the >>> query format freeform so that you could e.g. set a context like: >>> >>> backup:modtime>201612081034 >>> >>> which might (in theory) return a list of blocks which are newer than >>> the given timestamp. It would clearly be impossible to return all such >>> contexts. I wonder if we should carve out an exception here. >>> >>> >> Interesting. Even query 'backup:modtime>*' would be a problem, not only empty query list. >> >> Actually, we do not need to 'list contexts', as it is about management, not about data transfer. We only need a way to check, that particular query selects all needed contexts and no others. Just to be sure that we are know, what we will select by NBD_OPT_SET_META_CONTEXT with _same_ query. >> >> So, I suggest just to say, that _LIST_ can return error if too much contexts are selected. And same should be done for _SET_. And it should be documented that this result of query (list or error) should be equal for these two commands. > (two CCs that always bounce removed) > > Hmmm... Well in the '*' case, it's up to the namespace as to how it parses things, so '*' could be prohibited entirely or mean 'return the first 20 matching' or similar. So that's less of a problem. And 'set all' doesn't exist (unlike 'list all'). > > I think in the LIST case we should allow the server to omit contexts under certain circumstances, and allow SET to return ETOOBIG. > We can't prohibit '*' as query string as implementation defined. And we shouldn't (I think) define its meaning. Opposite, we can define, that query must not select more than 20 contexts, but I'm not sure that this is good. So, do you mean list('backup:modtime>*') -> empty list set('backup:modtime>*') -> ETOOBIG ? For me this looks strange.
On Wed, Dec 14, 2016 at 08:55:56PM +0300, Vladimir Sementsov-Ogievskiy wrote: > 14.12.2016 20:36, Alex Bligh wrote: > >> On 14 Dec 2016, at 17:05, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: > >>> However, this raises another question. Wouter deliberately made the > >>> query format freeform so that you could e.g. set a context like: > >>> > >>> backup:modtime>201612081034 > >>> > >>> which might (in theory) return a list of blocks which are newer than > >>> the given timestamp. It would clearly be impossible to return all such > >>> contexts. I wonder if we should carve out an exception here. > >>> > >>> > >> Interesting. Even query 'backup:modtime>*' would be a problem, not only empty query list. > >> > >> Actually, we do not need to 'list contexts', as it is about management, not about data transfer. We only need a way to check, that particular query selects all needed contexts and no others. Just to be sure that we are know, what we will select by NBD_OPT_SET_META_CONTEXT with _same_ query. > >> > >> So, I suggest just to say, that _LIST_ can return error if too much contexts are selected. And same should be done for _SET_. And it should be documented that this result of query (list or error) should be equal for these two commands. > > (two CCs that always bounce removed) > > > > Hmmm... Well in the '*' case, it's up to the namespace as to how it parses things, so '*' could be prohibited entirely or mean 'return the first 20 matching' or similar. So that's less of a problem. And 'set all' doesn't exist (unlike 'list all'). > > > > I think in the LIST case we should allow the server to omit contexts under certain circumstances, and allow SET to return ETOOBIG. > > > > We can't prohibit '*' as query string as implementation defined. And we > shouldn't (I think) define its meaning. Opposite, we can define, that > query must not select more than 20 contexts, but I'm not sure that this > is good. No, I don't think so, either. I can see how "list all contexts" might be difficult to implement, in the example of the dynamic contexts that I pointed out earlier. We could muddle the spec further to fix that (hypothetical) problem, but I'm not sure that's worth it. Instead, I would suggest that the spec leave it up to the namespace spec to define what gets listed when. The namespace should still give some indication that a particular *type* of context is available, though; e.g., _LIST_ could return 'backup:snapshot-diff{*}' as well as a list of 'backup:snapshot{snapshot-X}' contexts, where the latter are all available snapshots, and the 'snapshot-diff' bit implies that the whole cartesian product of all possible diffs between snapshots is possible in a format of 'backup:snapshot-diff{snapshot-X:snapshot-Y}', or some such (you get the idea). The spec currently says that a server SHOULD allow choosing exports by simply naming them, but doesn't make that a MUST; and it also says that the query format is defined by the namespace. If we simply clarify that the output of _LIST_ doesn't necessarily need to be a full list of all that's available (that it may be symbolic in the case of namespaces that allow things to be dynamically defined), we should be good.
> On 14 Dec 2016, at 17:55, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: > >> Hmmm... Well in the '*' case, it's up to the namespace as to how it parses things, so '*' could be prohibited entirely or mean 'return the first 20 matching' or similar. So that's less of a problem. And 'set all' doesn't exist (unlike 'list all'). >> >> I think in the LIST case we should allow the server to omit contexts under certain circumstances, and allow SET to return ETOOBIG. >> > > We can't prohibit '*' as query string as implementation defined. And we shouldn't (I think) define its meaning. Those two statements appear to me to contradict each other. The current documentation *does* define the query string (to the right of the colon) as implementation defined. I'm thus not sure what you mean. > Opposite, we can define, that query must not select more than 20 contexts, but I'm not sure that this is good. Each query can select from only one namespace in the current document (Wouter explained why). However, you can specify multiple queries. I don't think we need to define a hard limit of a particular number. > So, do you mean > > list('backup:modtime>*') -> empty list The result of that would depend on however the 'backup' context was defined, meaning it had the options of: * ETOOBIG * Listing some subset of available contexts (arguably 'none' is a subset) > set('backup:modtime>*') -> ETOOBIG Again, the result of that would depend on however the 'backup' context was defined, meaning it had the options of: * ETOOBIG * Listing some subset of available contexts (arguably 'none' is a subset) ... and it need not make the same choice as above, but I agree it would be logical for it to do so. As any form of wildcarding within a query refers to one particular namespace (as a query by its nature specifies a single namespace), we don't have to worry about the way wildcarding works in the standard, as Wouter pointed out. We merely need to provide that ETOOBIG and listing a subset are acceptable responses. What we do need to decide is what the response to list() (i.e. a list with a zero length list of queries) does. It's currently defined to return every context from every namespace. Options would include * ETOOBIG * Listing some subset of available contexts (arguably 'none' is a subset) * Allowing abbreviation of algorithmically specified contexts (e.g. in this case just returning 'backup:' to represent all available backup contexts)
> On 14 Dec 2016, at 18:13, Wouter Verhelst <w@uter.be> wrote: > > Instead, I would suggest that the spec leave it up to the namespace spec > to define what gets listed when. The namespace should still give some > indication that a particular *type* of context is available, though; > e.g., _LIST_ could return 'backup:snapshot-diff{*}' as well as a list of > 'backup:snapshot{snapshot-X}' contexts, where the latter are all > available snapshots, and the 'snapshot-diff' bit implies that the whole > cartesian product of all possible diffs between snapshots is possible in > a format of 'backup:snapshot-diff{snapshot-X:snapshot-Y}', or some such > (you get the idea). > > The spec currently says that a server SHOULD allow choosing exports by > simply naming them, but doesn't make that a MUST; and it also says that > the query format is defined by the namespace. If we simply clarify that > the output of _LIST_ doesn't necessarily need to be a full list of all > that's available (that it may be symbolic in the case of namespaces that > allow things to be dynamically defined), we should be good. Let me have a go at a change. However, note that Vladimir was answering a slightly different question. I was asking about listing ALL contexts (previously an empty query string, now a zero length list of queries), not a 'backup:*' type thing.
On Wed, Dec 14, 2016 at 06:18:58PM +0000, Alex Bligh wrote: > > > On 14 Dec 2016, at 18:13, Wouter Verhelst <w@uter.be> wrote: > > > > Instead, I would suggest that the spec leave it up to the namespace spec > > to define what gets listed when. The namespace should still give some > > indication that a particular *type* of context is available, though; > > e.g., _LIST_ could return 'backup:snapshot-diff{*}' as well as a list of > > 'backup:snapshot{snapshot-X}' contexts, where the latter are all > > available snapshots, and the 'snapshot-diff' bit implies that the whole > > cartesian product of all possible diffs between snapshots is possible in > > a format of 'backup:snapshot-diff{snapshot-X:snapshot-Y}', or some such > > (you get the idea). > > > > The spec currently says that a server SHOULD allow choosing exports by > > simply naming them, but doesn't make that a MUST; and it also says that > > the query format is defined by the namespace. If we simply clarify that > > the output of _LIST_ doesn't necessarily need to be a full list of all > > that's available (that it may be symbolic in the case of namespaces that > > allow things to be dynamically defined), we should be good. > > Let me have a go at a change. > > However, note that Vladimir was answering a slightly different question. Actually not :) > I was asking about listing ALL contexts (previously an empty query > string, now a zero length list of queries), not a 'backup:*' type > thing. What I was trying to say is that I think the result to _LIST_ with no queries should return all information the client needs to theoretically build the list of all possible contexts, even if that list may be so large as to be unfeasible for it to be built (e.g., in case of a cartesian product between all possible other contexts). I gave one example, but there may be more. My point is that if the query includes a namespace, the result should not be defined by our spec. If the query does not include a namespace, the result should be "complete" by whatever definition, but not unreasonable (i.e., don't just write a cartesian product to a client). This could allow an interactive client to present a user with a list of possible contexts before performing analysis on the block device, say.
> On 14 Dec 2016, at 18:18, Alex Bligh <alex@alex.org.uk> wrote: > > Let me have a go at a change. OK I've pushed a change. I reordered a few bits (to put the base:allocation next to the stuff that talks about metadata queries at the start), but the main change is below.
Wouter, (Our mails crossed and I've actually pushed something, but no matter) > On 14 Dec 2016, at 18:49, Wouter Verhelst <w@uter.be> wrote: > > What I was trying to say is that I think the result to _LIST_ with no > queries should return all information the client needs to theoretically > build the list of all possible contexts, even if that list may be > so large as to be unfeasible for it to be built (e.g., in case of a > cartesian product between all possible other contexts). I gave one > example, but there may be more. > > My point is that if the query includes a namespace, the result should > not be defined by our spec. If the query does not include a namespace, > the result should be "complete" by whatever definition, but not > unreasonable (i.e., don't just write a cartesian product to a client). > > This could allow an interactive client to present a user with a list of > possible contexts before performing analysis on the block device, say. OK, so first of all, one of the changes I made earlier was that now each of the commands carries a list of queries, the way you list everything is not 'having a query that doesn't contain a namespace' but rather doing a _LIST_ with no queries at all. But that's semantics and orthogonal to the main point. What I've proposed (and pushed - but feel free to alter it) is that 1. on _LIST_, the server can return fewer contexts than are available if returning all of them would consume undue levels of resources. 2. on _LIST_ where the contexts are 'algorithmic', the server can return e.g. 'X-Backup:' rather than 'X-Backup:modified>' and every integer. 3. On _SET_ if too many contexts are requested, the server may return an error (I think we need this anyway). That nearly does what you ask for, but I'm not sure how you any query could 'return all the information the client needs to build the list of all possible contexts'. For instance, in my backup example 'X-Backup:modified>[integer]' doesn't itself tell you anything, as you don't know whether the integer is a unix date time, in seconds after the epoch, milliseconds or whatever. What, surely, as a client you want to know is 'does it support the X-Backup: extension because I've read the spec for that and know that it has X-Backup:modified if so'. So I've suggested it return 'X-Backup:' only in that case, in which case from that (*and the spec*) you know how to build any query.
On Wed, Dec 14, 2016 at 07:01:15PM +0000, Alex Bligh wrote: > Wouter, > > (Our mails crossed and I've actually pushed something, but no matter) > > > On 14 Dec 2016, at 18:49, Wouter Verhelst <w@uter.be> wrote: > > > > What I was trying to say is that I think the result to _LIST_ with no > > queries should return all information the client needs to theoretically > > build the list of all possible contexts, even if that list may be > > so large as to be unfeasible for it to be built (e.g., in case of a > > cartesian product between all possible other contexts). I gave one > > example, but there may be more. > > > > My point is that if the query includes a namespace, the result should > > not be defined by our spec. If the query does not include a namespace, > > the result should be "complete" by whatever definition, but not > > unreasonable (i.e., don't just write a cartesian product to a client). > > > > This could allow an interactive client to present a user with a list of > > possible contexts before performing analysis on the block device, say. > > OK, so first of all, one of the changes I made earlier was that now > each of the commands carries a list of queries, the way you list > everything is not 'having a query that doesn't contain a namespace' > but rather doing a _LIST_ with no queries at all. But that's semantics > and orthogonal to the main point. > > What I've proposed (and pushed - but feel free to alter it) is that > > 1. on _LIST_, the server can return fewer contexts than are available > if returning all of them would consume undue levels of resources. > > 2. on _LIST_ where the contexts are 'algorithmic', the server can > return e.g. 'X-Backup:' rather than 'X-Backup:modified>' and > every integer. > > 3. On _SET_ if too many contexts are requested, the server may return > an error (I think we need this anyway). > > That nearly does what you ask for, but I'm not sure how you any query > could 'return all the information the client needs to build > the list of all possible contexts'. For instance, in my backup > example 'X-Backup:modified>[integer]' doesn't itself tell you > anything, as you don't know whether the integer is a unix > date time, in seconds after the epoch, milliseconds or whatever. > What, surely, as a client you want to know is 'does it support > the X-Backup: extension because I've read the spec for that and > know that it has X-Backup:modified if so'. So I've suggested it > return 'X-Backup:' only in that case, in which case from that > (*and the spec*) you know how to build any query. Actually, it does do what I ask for :-) A client which knows about spec X, Y, or Z, should get all the information from a _LIST_ command that it needs in order to know whether or not it can request a particular context or not. E.g., in the case of a "modified more recent than X seconds ago", the _LIST_ command should expose (somehow, defined by the spec of that context) that it supports that particular syntax, without requiring a list of all possible integers between 0 and 2^64 (for obvious reasons). This does require that the client know about a particular spec before it can query it, but that's going to be the case anyway -- a client should have no business asking for a context of which it has no implementation, since that would mean it is asking the server to send it information that it is then going to throw away anyway, which makes no sense. Additionally, this also allows a client which sees that no contexts were returned for one of the queries in its _SET_ command to differentiate beween "server does not support metadata context XYZ" and "server does support metadata context XYZ, but there is no relevant metadata context for the query I sent". Allowing a server to limit the information it sends at an arbitrary cut-off point of "20" or whatever would *not* do that, so I'm against it in the strongest of terms. Put otherwise: the information sent in return of a _LIST_ command with no query string (i.e., the command that asks for "everything") should be "complete" in the sense that it should allow a client to know whether a server has support for metadata context X, Y, or Z, even if that means it may have to issue further _LIST_ queries later on, or if it may have to combine some other information first. In the absense of dynamic namespaces, that is most easily implemented by just listing all metadata contexts in all namespaces that we know about, but it doesn't *have* to be that.
On Wed, Dec 14, 2016 at 06:51:48PM +0000, Alex Bligh wrote: > > > On 14 Dec 2016, at 18:18, Alex Bligh <alex@alex.org.uk> wrote: > > > > Let me have a go at a change. > > OK I've pushed a change. I reordered a few bits (to put the > base:allocation next to the stuff that talks about metadata > queries at the start), but the main change is below. > > -- > Alex Bligh > > > > @@ -970,15 +1029,38 @@ of the newstyle negotiation. > - String, query to list a subset of the available metadata > contexts. > > - If zero queries are sent, then the server MUST return all > - the metadata contexts that are available to the client to select > - on the given export with `NBD_OPT_SET_META_CONTEXT`. > - > For details on the query string, see under `NBD_OPT_SET_META_CONTEXT`. > > The server MUST either reply with an error (for instance `EINVAL` > if the option is not supported), or reply with a list of > `NBD_REP_META_CONTEXT` replies followed by `NBD_REP_ACK`. > + > + If zero queries are sent, then the server MUST return all > + the metadata contexts that are available to the client to select > + on the given export with `NBD_OPT_SET_META_CONTEXT`, save that: > + > + If one or more queries are sent, then the server MUST return > + those metadata contexts that are available to the client to > + select on the given export with `NBD_OPT_SET_META_CONTEXT`, > + and which match one or more of the queries given. The > + support of wildcarding within the leaf-name portion of > + the query string is dependent upon the namespace. > + > + In either case, however: > + > + * the server MAY return an incomplete list if returning a > + complete list would require undue resources. This part is a bad idea, as per my other mail. Returning a complete list should not be optional, but the definition of "complete" doesn't have to imply "list all possible context names". > + * the server MAY return a context consisting of a namespace and > + a colon only (i.e. omitting the leaf-name) to indicate that > + the namespace contains a large number of possible contexts > + within that namespace (for instance a namespace `X-backup` with > + contexts that indicate whether blocks were written after > + a given date might accept queries of the form > + `'X-backup:modifiedtime>[unixdate]'` where `[unixdate]` is an > + arbitrary integer, and in this case it might simply > + return `X-backup:`) This is way too detailed, I think. It should just allow namespaces to define what _LIST_ may return, as long as the client is somehow able to distill (through knowledge of the spec as well as the information sent in reply to the _LIST_ command) all the metadata contexts it can possibly select. > The metadata context ID in these replies is reserved and SHOULD be > set to zero; clients MUST disregard it. > > @@ -1009,7 +1091,9 @@ of the newstyle negotiation. > > If zero queries are sent, the server MUST select no metadata contexts. > > - The server MUST reply with a number of `NBD_REP_META_CONTEXT` > + The server MAY return `NBD_REP_ERR_TOO_BIG` if a request > + seeks to select too many contexts. Otherwise > + the server MUST reply with a number of `NBD_REP_META_CONTEXT` This part makes sense. > replies, one for each selected metadata context, each with a unique > metadata context ID, followed by `NBD_REP_ACK`. The metadata context > ID is transient and may vary across calls to `NBD_OPT_SET_META_CONTEXT`;
On 12/14/2016 12:09 PM, Wouter Verhelst wrote: > On Wed, Dec 14, 2016 at 03:08:40PM +0000, Alex Bligh wrote: >> (NB: I've already applied this and pushed it) > > Thanks. > >> * Change NBD_OPT_LIST_METADATA etc. to explicitly send a list of queries >> and add a count of queries so we can extend the command later (rather than >> rely on the length of option) > > Sure, that works. > >> * For NBD_OPT_LIST_METADATA make absence of any query (as opposed to zero >> length query) list all contexts, as absence of any query is now simple. >> >> * Move definition of namespaces in the document to somewhere more appopriate. >> >> * Various other minor changes as discussed on the mailing list > > Right. I think we're getting close to a good spec now, for this thing. > > One thing I've been thinking about that we might want to add: > > There may be cases where a server, in performing the required calls to > be able to handle a BLOCK_STATUS request, will end up with more > information than the client asked; e.g., if the client asked information > in the base:allocation context on an extent at offset X of length Y, > then the server might conceivably do an lseek(SEEK_DATA) and/or > lseek(SEEK_HOLE). The result of that call might be that the file offset > will now point to a location Z, where Z > (X+Y). > > Currently, our spec says "the sum of the *length* fields MUST not be > greater than the overall *length* of request". This means that in > essense, the server then has to throw away the information it has on the > range Z - (X + Y). In case a client was interested in that information, > that seems like a waste. I would therefore like to remove that > requirement, by rephrasing that "sum of the *length* fields" thing into > something along the following lines: > > In case the server returns N extents, the sum of the *length* fields > of the first N-1 extents MUST NOT be greater than the overall *length* > of the request. The final extent MAY exceed the length of the request > if the server has that information anyway as a side effect of looking > up the required information and wishes to share it. > > This would then result in the fact that the "length" field in the > BLOCK_STATUS command would be little more than a hint, since we're > saying that a server can return more data than requested (if it's > available anyway) and less data than requested (if it would be too > resource-intensive to provide all the information). Not sure whether all > that makes much sense anymore, but hey. > > In addition, the combination of a server providing more information than > requested with a "REQ_ONE" flag and a length field of zero could be an > interesting way to enumerate a whole export, too. Essentially, we could > define that as a client saying "I'm interested in what the size of the > extent at offset X is, and what its properties are". > > Thoughts? > No complaints here. Even though it becomes a bit of a hint, it still establishes some limitations. It's good enough, I think. --js
> On 14 Dec 2016, at 20:18, Wouter Verhelst <w@uter.be> wrote: > >> + * the server MAY return a context consisting of a namespace and >> + a colon only (i.e. omitting the leaf-name) to indicate that >> + the namespace contains a large number of possible contexts >> + within that namespace (for instance a namespace `X-backup` with >> + contexts that indicate whether blocks were written after >> + a given date might accept queries of the form >> + `'X-backup:modifiedtime>[unixdate]'` where `[unixdate]` is an >> + arbitrary integer, and in this case it might simply >> + return `X-backup:`) > > This is way too detailed, I think. It should just allow namespaces to > define what _LIST_ may return, as long as the client is somehow able to > distill (through knowledge of the spec as well as the information sent > in reply to the _LIST_ command) all the metadata contexts it can > possibly select. OK, this part now reads: The server MUST either reply with an error (for instance `EINVAL` if the option is not supported), or reply with a list of `NBD_REP_META_CONTEXT` replies followed by `NBD_REP_ACK`. If zero queries are sent, then the server MUST return all the metadata contexts that are available to the client to select on the given export with `NBD_OPT_SET_META_CONTEXT`, save that: If one or more queries are sent, then the server MUST return those metadata contexts that are available to the client to select on the given export with `NBD_OPT_SET_META_CONTEXT`, and which match one or more of the queries given. The support of wildcarding within the leaf-name portion of the query string is dependent upon the namespace. In either case, however, for any given namespace the server MAY, instead of exhaustively listing every matching context available to select (or every context available to select where no query is given), send sufficient context records back to allow a client with knowledge of the namespace to select any context. Each namespace returned MUST still satisfy the rules for namespaces (i.e. they must begin with the relevant namespace, followed by a colon, then printable non-whitespace UTF-8 characters, with the entire string not exceeding 255 bytes). This may be helpful where a client can construct algorithmic queries. For instance, a client might reply simply with the namespace with no leaf-name (e.g. 'X-FooBar:') or with a range of values (e.g. 'X-ModifiedDate:20160310-20161214'). The semantics of such a reply are a matter for the definition of the namespace. Hope that works.
Hi Alex, On Thu, Dec 15, 2016 at 10:04:05AM +0000, Alex Bligh wrote: > > > On 14 Dec 2016, at 20:18, Wouter Verhelst <w@uter.be> wrote: > > > >> + * the server MAY return a context consisting of a namespace and > >> + a colon only (i.e. omitting the leaf-name) to indicate that > >> + the namespace contains a large number of possible contexts > >> + within that namespace (for instance a namespace `X-backup` with > >> + contexts that indicate whether blocks were written after > >> + a given date might accept queries of the form > >> + `'X-backup:modifiedtime>[unixdate]'` where `[unixdate]` is an > >> + arbitrary integer, and in this case it might simply > >> + return `X-backup:`) > > > > This is way too detailed, I think. It should just allow namespaces to > > define what _LIST_ may return, as long as the client is somehow able to > > distill (through knowledge of the spec as well as the information sent > > in reply to the _LIST_ command) all the metadata contexts it can > > possibly select. > > > OK, this part now reads: > > The server MUST either reply with an error (for instance `EINVAL` > if the option is not supported), or reply with a list of > `NBD_REP_META_CONTEXT` replies followed by `NBD_REP_ACK`. > > If zero queries are sent, then the server MUST return all > the metadata contexts that are available to the client to select > on the given export with `NBD_OPT_SET_META_CONTEXT`, save that: This reads a bit awkward. I would do: s/save that:/except as explained below/ > If one or more queries are sent, then the server MUST return > those metadata contexts that are available to the client to > select on the given export with `NBD_OPT_SET_META_CONTEXT`, > and which match one or more of the queries given. The > support of wildcarding within the leaf-name portion of > the query string is dependent upon the namespace. > > In either case, however, for any given namespace the > server MAY, instead of exhaustively listing every > matching context available to select (or every context > available to select where no query is given), send > sufficient context records back to allow a client with > knowledge of the namespace to select any context. Each > namespace returned MUST still satisfy the rules for > namespaces (i.e. they must begin with the relevant > namespace, followed by a colon, then printable non-whitespace > UTF-8 characters, Why restrict to non-whitespace characters? (printable makes sense...) > with the entire string not exceeding > 255 bytes). This may be helpful where a client can > construct algorithmic queries. For instance, a client might > reply simply with the namespace with no leaf-name (e.g. > 'X-FooBar:') or with a range of values (e.g. > 'X-ModifiedDate:20160310-20161214'). The semantics of > such a reply are a matter for the definition of the > namespace. > > Hope that works.
Wouter, > This reads a bit awkward. I would do: > > s/save that:/except as explained below/ Possibly a British English thing. Will fix. >> If one or more queries are sent, then the server MUST return >> those metadata contexts that are available to the client to >> select on the given export with `NBD_OPT_SET_META_CONTEXT`, >> and which match one or more of the queries given. The >> support of wildcarding within the leaf-name portion of >> the query string is dependent upon the namespace. >> >> In either case, however, for any given namespace the >> server MAY, instead of exhaustively listing every >> matching context available to select (or every context >> available to select where no query is given), send >> sufficient context records back to allow a client with >> knowledge of the namespace to select any context. Each >> namespace returned MUST still satisfy the rules for >> namespaces (i.e. they must begin with the relevant >> namespace, followed by a colon, then printable non-whitespace >> UTF-8 characters, > > Why restrict to non-whitespace characters? (printable makes sense...) Because the namespaces and leaf-names are already restricted to non-whitespace characters. I thought having tabs, line feeds, returns, em-space, en-space etc. was not particularly useful. I could be persuaded to relent re spaces.
On Thu, Dec 15, 2016 at 04:32:23PM +0000, Alex Bligh wrote: > Wouter, > > > This reads a bit awkward. I would do: > > > > s/save that:/except as explained below/ > > Possibly a British English thing. Will fix. It's mostly that the "save that:" suggests to me that the exception follows immediately (i.e., in the next paragraph). As you wrote it, it doesn't; it shows up in the paragraph after that, with a "however" clause. This was very confusing; I had to read it several times before I understood what you wanted to say. > >> If one or more queries are sent, then the server MUST return > >> those metadata contexts that are available to the client to > >> select on the given export with `NBD_OPT_SET_META_CONTEXT`, > >> and which match one or more of the queries given. The > >> support of wildcarding within the leaf-name portion of > >> the query string is dependent upon the namespace. > >> > >> In either case, however, for any given namespace the > >> server MAY, instead of exhaustively listing every > >> matching context available to select (or every context > >> available to select where no query is given), send > >> sufficient context records back to allow a client with > >> knowledge of the namespace to select any context. Each > >> namespace returned MUST still satisfy the rules for > >> namespaces (i.e. they must begin with the relevant > >> namespace, followed by a colon, then printable non-whitespace > >> UTF-8 characters, > > > > Why restrict to non-whitespace characters? (printable makes sense...) > > Because the namespaces and leaf-names are already restricted to > non-whitespace characters. I thought having tabs, line feeds, > returns, em-space, en-space etc. was not particularly useful. > I could be persuaded to relent re spaces. I could imagine that the context might include as part of its name a user-defined bit. If we're going to disallow whitespace, then that would mean namespaces would have to do all sorts of escaping etc. I don't think that's a good idea.
> On 15 Dec 2016, at 16:49, Wouter Verhelst <w@uter.be> wrote: > >> Because the namespaces and leaf-names are already restricted to >> non-whitespace characters. I thought having tabs, line feeds, >> returns, em-space, en-space etc. was not particularly useful. >> I could be persuaded to relent re spaces. > > I could imagine that the context might include as part of its name a > user-defined bit. If we're going to disallow whitespace, then that would > mean namespaces would have to do all sorts of escaping etc. I don't > think that's a good idea. So to be clear do you want to include all whitespace everywhere? Or just to the right of the colon in queries?
On Thu, Dec 15, 2016 at 05:34:48PM +0000, Alex Bligh wrote: > > > On 15 Dec 2016, at 16:49, Wouter Verhelst <w@uter.be> wrote: > > > >> Because the namespaces and leaf-names are already restricted to > >> non-whitespace characters. I thought having tabs, line feeds, > >> returns, em-space, en-space etc. was not particularly useful. > >> I could be persuaded to relent re spaces. > > > > I could imagine that the context might include as part of its name a > > user-defined bit. If we're going to disallow whitespace, then that would > > mean namespaces would have to do all sorts of escaping etc. I don't > > think that's a good idea. > > So to be clear do you want to include all whitespace > everywhere? Or just to the right of the colon in queries? Just in queries. I agree that in namespace names, it doesn't make much sense.
> On 16 Dec 2016, at 15:52, Wouter Verhelst <w@uter.be> wrote: > > On Thu, Dec 15, 2016 at 05:34:48PM +0000, Alex Bligh wrote: >> >>> On 15 Dec 2016, at 16:49, Wouter Verhelst <w@uter.be> wrote: >>> >>>> Because the namespaces and leaf-names are already restricted to >>>> non-whitespace characters. I thought having tabs, line feeds, >>>> returns, em-space, en-space etc. was not particularly useful. >>>> I could be persuaded to relent re spaces. >>> >>> I could imagine that the context might include as part of its name a >>> user-defined bit. If we're going to disallow whitespace, then that would >>> mean namespaces would have to do all sorts of escaping etc. I don't >>> think that's a good idea. >> >> So to be clear do you want to include all whitespace >> everywhere? Or just to the right of the colon in queries? > > Just in queries. I agree that in namespace names, it doesn't make much > sense. I've changed it so the returns from queries can now have spaces to the right of the colon if they aren't returning complete context names. Obviously if they are returning context names, by the definition of leaf-names they can't have spaces in the leaf-name. The queries themselves can consist of anything (currently), save that they must begin with a namespace and a colon.
On Fri, Dec 16, 2016 at 04:25:27PM +0000, Alex Bligh wrote: > > > On 16 Dec 2016, at 15:52, Wouter Verhelst <w@uter.be> wrote: > > > > On Thu, Dec 15, 2016 at 05:34:48PM +0000, Alex Bligh wrote: > >> > >>> On 15 Dec 2016, at 16:49, Wouter Verhelst <w@uter.be> wrote: > >>> > >>>> Because the namespaces and leaf-names are already restricted to > >>>> non-whitespace characters. I thought having tabs, line feeds, > >>>> returns, em-space, en-space etc. was not particularly useful. > >>>> I could be persuaded to relent re spaces. > >>> > >>> I could imagine that the context might include as part of its name a > >>> user-defined bit. If we're going to disallow whitespace, then that would > >>> mean namespaces would have to do all sorts of escaping etc. I don't > >>> think that's a good idea. > >> > >> So to be clear do you want to include all whitespace > >> everywhere? Or just to the right of the colon in queries? > > > > Just in queries. I agree that in namespace names, it doesn't make much > > sense. > > I've changed it so the returns from queries can now have spaces to > the right of the colon if they aren't returning complete context names. > Obviously if they are returning context names, by the definition of > leaf-names they can't have spaces in the leaf-name. Sorry, that still doesn't alleviate my concerns. Why are we restricting what metadata contexts can use for their names? We don't restrict what export names can use either, and I could imagine a metadata context wanting to use an export name as a metadata context name (e.g., "provide a bitmap on the diff between this export and the export defined by name XYZ"). As soon as we get to the right of the colon, what sits there is defined by the namespace, not by us. It doesn't feel right to add too many restrictions. I think the whole query string should be a "string" as laid out earlier in the document. Here, it actually *does* make sense for a string to be very long, because it's not meant to be human-readable. I've therefore removed that restriction as well as the "255 bytes max" one that you added, since I don't think they make much sense. That doesn't mean I can't be convinced otherwise by good arguments, but they'd have to be very good ones ;-) > The queries themselves can consist of anything (currently), > save that they must begin with a namespace and a colon. Sure.
> On 17 Dec 2016, at 08:34, Wouter Verhelst <w@uter.be> wrote: > > I've therefore removed that restriction as well as the "255 bytes max" > one that you added, since I don't think they make much sense. That > doesn't mean I can't be convinced otherwise by good arguments, but > they'd have to be very good ones ;-) But we were at cross purposes. I was talking about the return from queries, and you've adjusted namespaces (in toto). I've fixed up your edit slightly, and brought the return for queries in line with it. I hope this nit is now dead!
14.12.2016 21:17, Alex Bligh wrote: >> On 14 Dec 2016, at 17:55, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: >> >>> Hmmm... Well in the '*' case, it's up to the namespace as to how it parses things, so '*' could be prohibited entirely or mean 'return the first 20 matching' or similar. So that's less of a problem. And 'set all' doesn't exist (unlike 'list all'). >>> >>> I think in the LIST case we should allow the server to omit contexts under certain circumstances, and allow SET to return ETOOBIG. >>> >> We can't prohibit '*' as query string as implementation defined. And we shouldn't (I think) define its meaning. > Those two statements appear to me to contradict each other. The current documentation *does* define the query string (to the right of the colon) as implementation defined. I'm thus not sure what you mean. > >> Opposite, we can define, that query must not select more than 20 contexts, but I'm not sure that this is good. > Each query can select from only one namespace in the current document (Wouter explained why). However, you can specify multiple queries. > > I don't think we need to define a hard limit of a particular number. > >> So, do you mean >> >> list('backup:modtime>*') -> empty list > The result of that would depend on however the 'backup' context was defined, meaning it had the options of: > * ETOOBIG > * Listing some subset of available contexts (arguably 'none' is a subset) > >> set('backup:modtime>*') -> ETOOBIG > Again, the result of that would depend on however the 'backup' context was defined, meaning it had the options of: > * ETOOBIG > * Listing some subset of available contexts (arguably 'none' is a subset) > ... and it need not make the same choice as above, but I agree it would be logical for it to do so. > > As any form of wildcarding within a query refers to one particular namespace (as a query by its nature specifies a single namespace), we don't have to worry about the way wildcarding works in the standard, as Wouter pointed out. We merely need to provide that ETOOBIG and listing a subset are acceptable responses. > > What we do need to decide is what the response to list() (i.e. a list with a zero length list of queries) does. It's currently defined to return every context from every namespace. Options would include > * ETOOBIG > * Listing some subset of available contexts (arguably 'none' is a subset) > * Allowing abbreviation of algorithmically specified contexts (e.g. in this case just returning 'backup:' to represent all available backup contexts) Shouldn't we add some flags to REP_META_CONTEXT, for client to be insure, is returned string a direct context name or some kind of wildcard?|| Just a flags field, with one flag defined for now: |NBD_REP_META_CONTEXT_LEAF and others reserved.|
14.12.2016 22:01, Alex Bligh wrote: > Wouter, > > (Our mails crossed and I've actually pushed something, but no matter) > >> On 14 Dec 2016, at 18:49, Wouter Verhelst <w@uter.be> wrote: >> >> What I was trying to say is that I think the result to _LIST_ with no >> queries should return all information the client needs to theoretically >> build the list of all possible contexts, even if that list may be >> so large as to be unfeasible for it to be built (e.g., in case of a >> cartesian product between all possible other contexts). I gave one >> example, but there may be more. >> >> My point is that if the query includes a namespace, the result should >> not be defined by our spec. If the query does not include a namespace, >> the result should be "complete" by whatever definition, but not >> unreasonable (i.e., don't just write a cartesian product to a client). >> >> This could allow an interactive client to present a user with a list of >> possible contexts before performing analysis on the block device, say. > OK, so first of all, one of the changes I made earlier was that now > each of the commands carries a list of queries, the way you list > everything is not 'having a query that doesn't contain a namespace' > but rather doing a _LIST_ with no queries at all. But that's semantics > and orthogonal to the main point. > > What I've proposed (and pushed - but feel free to alter it) is that I think, an overview in "## Metadata querying" should be updated too. > > 1. on _LIST_, the server can return fewer contexts than are available > if returning all of them would consume undue levels of resources. > > 2. on _LIST_ where the contexts are 'algorithmic', the server can > return e.g. 'X-Backup:' rather than 'X-Backup:modified>' and > every integer. > > 3. On _SET_ if too many contexts are requested, the server may return > an error (I think we need this anyway). > > That nearly does what you ask for, but I'm not sure how you any query > could 'return all the information the client needs to build > the list of all possible contexts'. For instance, in my backup > example 'X-Backup:modified>[integer]' doesn't itself tell you > anything, as you don't know whether the integer is a unix > date time, in seconds after the epoch, milliseconds or whatever. > What, surely, as a client you want to know is 'does it support > the X-Backup: extension because I've read the spec for that and > know that it has X-Backup:modified if so'. So I've suggested it > return 'X-Backup:' only in that case, in which case from that > (*and the spec*) you know how to build any query. >
A bit out of topic, but... > structured replies via `NBD_OPT_STRUCTURED_REPLY`. Conversely, if > structured replies are negotiated, the server MUST use a > structured reply for any response with a payload, and MUST NOT use > a simple reply for `NBD_CMD_READ` (even for the case of an early > `EINVAL` due to bad flags), but MAY use either a simple reply or a > structured reply to all other requests. What was the reason for it? Why not to negotiate forced structured read separately? Actually, this spec forces any server, which wants to implement structured reply implement structured read too. But what if it don't want to? If it only wants to implement BLOCK_STATUS? So, what about changing it, to allow BLOCK_STATUS (or other future structured replies) without structured read? Structured read is good only for sparse formats, when BLOCK_STATUS is more global. I understand, that servers may implement simple (and useless) one-chunk structured read, but I think that it is better to fix the spec, to not provoke servers use such workaround.
On 12/27/2016 08:09 AM, Vladimir Sementsov-Ogievskiy wrote: > A bit out of topic, but... > >> structured replies via `NBD_OPT_STRUCTURED_REPLY`. Conversely, if >> structured replies are negotiated, the server MUST use a >> structured reply for any response with a payload, and MUST NOT use >> a simple reply for `NBD_CMD_READ` (even for the case of an early >> `EINVAL` due to bad flags), but MAY use either a simple reply or a >> structured reply to all other requests. > > What was the reason for it? Why not to negotiate forced structured read > separately? Because the whole reason we (want to) introduce structured replies IS to fix the inability to do a partial read or an efficient read of zeroes. The fact that structured reads make other extensions possible is icing on the cake, but if you are going to implement structured replies at all, you might as well make reads do it (since reads are mandatory, while all other commands that utilize structured replies are optional). > Actually, this spec forces any server, which wants to > implement structured reply implement structured read too. But what if it > don't want to? If it only wants to implement BLOCK_STATUS? We intentionally do not want to permit such a server. Any server that wants to implement BLOCK_STATUS must also implement structured reads. > > So, what about changing it, to allow BLOCK_STATUS (or other future > structured replies) without structured read? Structured read is good > only for sparse formats, Not true - it is also good for error recovery even on non-sparse exports. The existing read command is flawed in that it cannot be implemented with partial read support - once the server has started sending data, it MUST finish sending the number of bytes requested by the client, which means the server MUST either buffer the read up front (to ensure no read error is possible once the data sending is started), or MUST disconnect if a read error is detected partway through. With structured reads, you can implement much more efficient servers that start sending the reply right away without buffering, but which can still error out on a read error partway through. > when BLOCK_STATUS is more global. I understand, > that servers may implement simple (and useless) one-chunk structured > read, but I think that it is better to fix the spec, to not provoke > servers use such workaround. To date, we don't know of ANY servers that implement structured replies at all, whether for structured reads or for BLOCK_STATUS. I'm working on qemu patches to make qemu implement both, and it will serve as an example of how easy or hard it is to implement things. But I see NO reason to weaken the spec to allow structured BLOCK_STATUS without structured reads.
Hi Vladimir, On Mon, Dec 26, 2016 at 05:52:54PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Shouldn't we add some flags to REP_META_CONTEXT, for client to be insure, is > returned string a direct context name or some kind of wildcard? Just a flags > field, with one flag defined for now: NBD_REP_META_CONTEXT_LEAF and others > reserved. I think it should be up to the metadata context namespace definition to define which syntax represents a direct context name and which represents a wildcard (if the latter are supported). A client which doesn't know what a given metadata context implements can't reasonably ask for information from that context anyway (since then the client wouldn't know what to do with the returned information), so it doesn't help much to add a flag here.
> On 28 Dec 2016, at 00:18, Wouter Verhelst <w@uter.be> wrote: > > On Mon, Dec 26, 2016 at 05:52:54PM +0300, Vladimir Sementsov-Ogievskiy wrote: >> Shouldn't we add some flags to REP_META_CONTEXT, for client to be insure, is >> returned string a direct context name or some kind of wildcard? Just a flags >> field, with one flag defined for now: NBD_REP_META_CONTEXT_LEAF and others >> reserved. > > I think it should be up to the metadata context namespace definition to > define which syntax represents a direct context name and which > represents a wildcard (if the latter are supported). > > A client which doesn't know what a given metadata context implements > can't reasonably ask for information from that context anyway (since > then the client wouldn't know what to do with the returned information), > so it doesn't help much to add a flag here. I agree. Vladimir: if this isn't clear from the text, please suggest a change.
Vladimir, > On 26 Dec 2016, at 15:57, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: > >> OK, so first of all, one of the changes I made earlier was that now >> each of the commands carries a list of queries, the way you list >> everything is not 'having a query that doesn't contain a namespace' >> but rather doing a _LIST_ with no queries at all. But that's semantics >> and orthogonal to the main point. >> >> What I've proposed (and pushed - but feel free to alter it) is that > > I think, an overview in "## Metadata querying" should be updated too. Sorry, which bit needs updating? The only bit in that more general section that mentions _LIST_ is: > First, during negotiation, if the client wishes to query metadata > during transmission, the client MUST select one or more metadata > contexts with the NBD_OPT_SET_META_CONTEXT command. If needed, the > client can use NBD_OPT_LIST_META_CONTEXT to list contexts that the > server supports. That still seems correct.
> On 27 Dec 2016, at 14:09, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: > > What was the reason for it? Why not to negotiate forced structured read separately? Actually, this spec forces any server, which wants to implement structured reply implement structured read too. But what if it don't want to? If it only wants to implement BLOCK_STATUS? > > So, what about changing it, to allow BLOCK_STATUS (or other future structured replies) without structured read? Structured read is good only for sparse formats, when BLOCK_STATUS is more global. I understand, that servers may implement simple (and useless) one-chunk structured read, but I think that it is better to fix the spec, to not provoke servers use such workaround. In essence the current reply format is broken, because it does not provide a means of delivering an error mid reply AND expects fixed length replies. Block Status is the poster child for something that benefits from structured replies. So, BLOCK_STATUS requires structured replies. We thus have the choice between: * Allowing structured replies to be implemented on a per-command basis * Mandating that if structured replies are implemented, they are implemented universally The second option is far simpler, and is what the current structured reply extension sets out. It's also a good nudge to server implementers (me included) to implement structured replies. Your assumption that structured replies are only good for sparse formats is incorrect. It's good for anything that wants to include sensible error handling. As a server author, error handling at the moment is a pain. Most of us ALREADY break up large read requests (to avoid malloc() of huge amounts of memory), so we already have a read loop. If we don't then the 'one chunk' method is perfectly acceptable. If I implemented structured replies at all, I'd be sharing the code between multiple users in any case.
29.12.2016 19:04, Alex Bligh wrote: >> On 28 Dec 2016, at 00:18, Wouter Verhelst <w@uter.be> wrote: >> >> On Mon, Dec 26, 2016 at 05:52:54PM +0300, Vladimir Sementsov-Ogievskiy wrote: >>> Shouldn't we add some flags to REP_META_CONTEXT, for client to be insure, is >>> returned string a direct context name or some kind of wildcard? Just a flags >>> field, with one flag defined for now: NBD_REP_META_CONTEXT_LEAF and others >>> reserved. >> I think it should be up to the metadata context namespace definition to >> define which syntax represents a direct context name and which >> represents a wildcard (if the latter are supported). >> >> A client which doesn't know what a given metadata context implements >> can't reasonably ask for information from that context anyway (since >> then the client wouldn't know what to do with the returned information), >> so it doesn't help much to add a flag here. > I agree. > > Vladimir: if this isn't clear from the text, please suggest a change. > I'm ok with Wouter's arguments
29.12.2016 19:08, Alex Bligh wrote: > Vladimir, > >> On 26 Dec 2016, at 15:57, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: >> >>> OK, so first of all, one of the changes I made earlier was that now >>> each of the commands carries a list of queries, the way you list >>> everything is not 'having a query that doesn't contain a namespace' >>> but rather doing a _LIST_ with no queries at all. But that's semantics >>> and orthogonal to the main point. >>> >>> What I've proposed (and pushed - but feel free to alter it) is that >> I think, an overview in "## Metadata querying" should be updated too. > Sorry, which bit needs updating? > > The only bit in that more general section that mentions _LIST_ is: > >> First, during negotiation, if the client wishes to query metadata >> during transmission, the client MUST select one or more metadata >> contexts with the NBD_OPT_SET_META_CONTEXT command. If needed, the >> client can use NBD_OPT_LIST_META_CONTEXT to list contexts that the >> server supports. > That still seems correct. > Hmm, I am not sure. Seems like it was out of topic.. Here the only thing may be s/list contexts that the server supports/available list contexts/, to be more consistent, but it is not important.
from current version:
>>> If an error occurs, the server SHOULD set the appropriate error
code in the error field of an error chunk. However, if the error does
not involve invalid usage (such as a request beyond the bounds of the
file), a server MAY reply with a single block status descriptor with
/length/ matching the requested length, and /status/ of 0 rather than
reporting the error.
- single block status descriptor for each context? Isn't it
implementation defined? Or we finally decided to force 0 status to be
safe default for all contexts? If it is so, it would be better to
describe this separately. However, personally, I'd prefer to not define
contexts internal semantics at all.
> On 11 Jan 2017, at 15:31, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: > > >>> If an error occurs, the server SHOULD set the appropriate error code in the error field of an error chunk. However, if the error does not involve invalid usage (such as a request beyond the bounds of the file), a server MAY reply with a single block status descriptor with length matching the requested length, and status of 0 rather than reporting the error. > - single block status descriptor for each context? Isn't it implementation defined? Or we finally decided to force 0 status to be safe default for all contexts? If it is so, it would be better to describe this separately. However, personally, I'd prefer to not define contexts internal semantics at all. I think this is Wouter's wording, but I think 'a status appropriate to the context' would be better. Each context then needs to define what that is. Either that or 'the context's default status' and that should be in the definition of the context.
11.01.2017 22:00, Alex Bligh wrote: >> On 11 Jan 2017, at 15:31, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: >> >>>>> If an error occurs, the server SHOULD set the appropriate error code in the error field of an error chunk. However, if the error does not involve invalid usage (such as a request beyond the bounds of the file), a server MAY reply with a single block status descriptor with length matching the requested length, and status of 0 rather than reporting the error. >> - single block status descriptor for each context? Isn't it implementation defined? Or we finally decided to force 0 status to be safe default for all contexts? If it is so, it would be better to describe this separately. However, personally, I'd prefer to not define contexts internal semantics at all. > I think this is Wouter's wording, but I think 'a status appropriate to the context' would be better. Each context then needs to define what that is. Either that or 'the context's default status' and that should be in the definition of the context. > Yes this is better. But is it actually needed to force contexts have some safe default? If context wants it may define such default without this requirement.. So, should it be requirement at all?
(Sorry, it may be a bit out of topic, but I hope it is comfortable for all that I just comment current version of the feature by answering cover letter of last related patch set) From |"NBD_OPT_LIST_META_CONTEXT| (9)": > The server MUST either reply with an error (for instance |EINVAL| if > the option is not supported), or reply with a list of > |NBD_REP_META_CONTEXT| replies followed by |NBD_REP_ACK|.: NBD_REP_ERR_UNSUP for the case in braces? Should it be normal option reply packet?
> On 12 Jan 2017, at 07:05, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: > > Yes this is better. But is it actually needed to force contexts have some safe default? If context wants it may define such default without this requirement.. So, should it be requirement at all? I've changed this to: of the file), a server MAY reply with a single block status descriptor with *length* matching the requested length, rather than reporting the error; in this case the context MAY mandate the status returned.
> On 12 Jan 2017, at 11:43, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: > > From "NBD_OPT_LIST_META_CONTEXT (9)": >> The server MUST either reply with an error (for instance EINVAL if the option is not supported), or reply with a list of NBD_REP_META_CONTEXT replies followed by NBD_REP_ACK.: > NBD_REP_ERR_UNSUP for the case in braces? Should it be normal option reply packet? Thanks, fixed.
12.01.2017 16:11, Alex Bligh wrote: >> On 12 Jan 2017, at 07:05, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: >> >> Yes this is better. But is it actually needed to force contexts have some safe default? If context wants it may define such default without this requirement.. So, should it be requirement at all? > I've changed this to: > > of the file), a server MAY reply with a single block status > descriptor with *length* matching the requested length, rather than > reporting the error; in this case the context MAY mandate the > status returned. > > Hmm, I don't understand. So, it MAY mandate and therefore MAY NOT do it? And what client should think, if server replies with one chunk matching the request length and not mandate the status?
> On 13 Jan 2017, at 09:48, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: > > 12.01.2017 16:11, Alex Bligh wrote: >>> On 12 Jan 2017, at 07:05, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: >>> >>> Yes this is better. But is it actually needed to force contexts have some safe default? If context wants it may define such default without this requirement.. So, should it be requirement at all? >> I've changed this to: >> >> of the file), a server MAY reply with a single block status >> descriptor with *length* matching the requested length, rather than >> reporting the error; in this case the context MAY mandate the >> status returned. >> >> > > Hmm, I don't understand. So, it MAY mandate and therefore MAY NOT do it? And what client should think, if server replies with one chunk matching the request length and not mandate the status? Some contexts may mandate a particular value (so for instance the allocation context might mandate 0). Some contexts may not mandate a particular value, in which case the interpretation is dependent upon the context (just like any other status value). EG a context which returned an status of 7 if the range contained a prime number, and else 3, could carry on doing that. As it doesn't make sense to interpret status returns without an understanding of the particular context, we might as well simply extend this to 'beyond the range' returns - as I think you pointed out!
13.01.2017 13:29, Alex Bligh wrote: >> On 13 Jan 2017, at 09:48, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: >> >> 12.01.2017 16:11, Alex Bligh wrote: >>>> On 12 Jan 2017, at 07:05, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: >>>> >>>> Yes this is better. But is it actually needed to force contexts have some safe default? If context wants it may define such default without this requirement.. So, should it be requirement at all? >>> I've changed this to: >>> >>> of the file), a server MAY reply with a single block status >>> descriptor with *length* matching the requested length, rather than >>> reporting the error; in this case the context MAY mandate the >>> status returned. >>> >>> >> Hmm, I don't understand. So, it MAY mandate and therefore MAY NOT do it? And what client should think, if server replies with one chunk matching the request length and not mandate the status? > Some contexts may mandate a particular value (so for instance the allocation context might mandate 0). > > Some contexts may not mandate a particular value, in which case the interpretation is dependent upon the context (just like any other status value). EG a context which returned an status of 7 if the range contained a prime number, and else 3, could carry on doing that. > > As it doesn't make sense to interpret status returns without an understanding of the particular context, we might as well simply extend this to 'beyond the range' returns - as I think you pointed out! > >>> The status flags are intentionally defined so that a server MAY always safely report a status of 0 for any block - Actually, status flags are _not_ defined. (each context defines it's own status flags)
About extents: is 32bit length enough? We will have to send 4096 for empty 16tb disk..
> On 20 Jan 2017, at 17:04, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: > > About extents: is 32bit length enough? We will have to send 4096 for empty 16tb disk.. The nbd protocol uniformly uses 32 bit lengths (for better or for worse). This is baked into the specification heavily. I'm not sure sending 4,096 items for an empty 16TB disk is any great hardship to be honest.
On 01/20/2017 12:00 PM, Alex Bligh wrote: > >> On 20 Jan 2017, at 17:04, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: >> >> About extents: is 32bit length enough? We will have to send 4096 for empty 16tb disk.. > > The nbd protocol uniformly uses 32 bit lengths (for better or for worse). This is baked into the specification heavily. > > I'm not sure sending 4,096 items for an empty 16TB disk is any great hardship to be honest. If it truly matters, an idea that has already been floated on the list is to add a new NBD_CMD_FLAG_SCALE (or some other spelling) that states that a particular command is sending lengths scaled by a particular factor (by the block size? obviously it would have to be better specified). Under this idea, the scaling factor would allow you to report larger extents for fewer back-and-forth operations, but it gets tricky if the scaling is allowed to get larger than the minimum granularity between extent changes. The other idea that has been floated is a way to report whether the entire export is known to be all-zero content at the time the client connects, which would trade 4096+ queries (you'd actually have to do more than 4096 queries, since length is < 4G, not <= 4G) with a single piece of information at the time the client connects. Either way, discussion on such enhancements are probably worth a new thread.
On Fri, Jan 20, 2017 at 01:35:10PM -0600, Eric Blake wrote: > On 01/20/2017 12:00 PM, Alex Bligh wrote: > > > >> On 20 Jan 2017, at 17:04, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: > >> > >> About extents: is 32bit length enough? We will have to send 4096 for empty 16tb disk.. > > > > The nbd protocol uniformly uses 32 bit lengths (for better or for worse). This is baked into the specification heavily. > > > > I'm not sure sending 4,096 items for an empty 16TB disk is any great hardship to be honest. > > If it truly matters, an idea that has already been floated on the list > is to add a new NBD_CMD_FLAG_SCALE (or some other spelling) that states > that a particular command is sending lengths scaled by a particular > factor (by the block size? obviously it would have to be better > specified). Under this idea, the scaling factor would allow you to > report larger extents for fewer back-and-forth operations, but it gets > tricky if the scaling is allowed to get larger than the minimum > granularity between extent changes. Right, but people objected to it on grounds of it being too complicated (which I think was a fair point). > The other idea that has been floated is a way to report whether the > entire export is known to be all-zero content at the time the client > connects, which would trade 4096+ queries (you'd actually have to do > more than 4096 queries, since length is < 4G, not <= 4G) with a single > piece of information at the time the client connects. That's pretty special-case, and I objected to it on those grounds :-) However, I don't think there's anything necessarily wrong in changing the size of the length field in the reply header of the NBD_REPLY_TYPE_BLOCK_STATUS packet. That way, combined with the fact that we'd already stated that a server may send more information than was requested, a client could ask for metadata on an export, and if the extent is the whole size of the export, the server could say so in a single reply for all export sizes. I suppose it'd be slightly odd to have the request use 32-bit lengths and the response be expressed in 64-bit ones, but I suppose that's the price we pay to remain a backwards compatible and not too complicated a protocol. > Either way, discussion on such enhancements are probably worth a new > thread. Sure.
Reviving an old thread, to bring up questions based on a new push to implement this extension in qemu. On 12/14/2016 09:08 AM, Alex Bligh wrote: > > * Change NBD_OPT_LIST_METADATA etc. to explicitly send a list of queries > and add a count of queries so we can extend the command later (rather than > rely on the length of option) > > * For NBD_OPT_LIST_METADATA make absence of any query (as opposed to zero > length query) list all contexts, as absence of any query is now simple. > > * Move definition of namespaces in the document to somewhere more appopriate. > > * Various other minor changes as discussed on the mailing list > > Signed-off-by: Alex Bligh <alex@alex.org.uk> > --- > doc/proto.md | 179 +++++++++++++++++++++++++++++++++++++---------------------- > 1 file changed, 112 insertions(+), 67 deletions(-) > > > +Metadata contexts are identified by their names. The name MUST > +consist of a namespace, followed by a colon, followed by a leaf-name. > +The namespace and the leaf-name must each consist entirely of > +printable non-whitespace UTF-8 characters other than colons, > +and be non-empty. The entire name (namespace, colon and leaf-name) > +MUST NOT exceed 255 bytes (and therefore botht he namespace and > +leaf-name are guaranteed to be smaller than 255 bytes). > + > +Namespaces MUST be consist of one of the following: > +- `base`, for metadata contexts defined by this document; > +- `nbd-server`, for metadata contexts defined by the > + implementation that accompanies this document (none > + currently); > +- `x-*`, where `*` can be replaced by any random string not We're inconsistent on whether extensions are 'x-' or 'X-'; we should tighten that up. (No real impact to implementing things - a server should just ignore namespaces it doesn't recognize, regardless of how that unknown namespace was spelled). > + containing colons, for local experiments. This SHOULD NOT be > + used by metadata contexts that are expected to be widely used. > +- A third-party namespace from the list below. > @@ -932,51 +961,58 @@ of the newstyle negotiation. > + If zero queries are sent, then the server MUST return all > + the metadata contexts it knows about. > + > + For details on the query string, see under `NBD_OPT_SET_META_CONTEXT`. > + > + The server MUST either reply with an error (for instance `EINVAL` > + if the option is not supported), or reply with a list of > + `NBD_REP_META_CONTEXT` replies followed by `NBD_REP_ACK`. > + The metadata context ID in these replies is reserved and SHOULD be > + set to zero; clients MUST disregard it. The question came up whether the server is required/permitted to diagnose bogus queries (a query for "bad" has no colon, and therefore cannot represent any namespace - is that required to be an error, or can the server silently ignore it and process the rest of the requests? Can the client rely on the server diagnosing bad requests, or must it be prepared for the server to just ignore bad requests?). Here's my thinking: A server implementation may want to vet only the first 5 characters of a request (because it only supports the namespace "base:"). If that server encounters a client asks for context "X-longname:", that is a valid request (so we should NOT reply with an error, but merely ignore it as an unknown namespace); but if a client asks for namespace "garbage", we have the option of whether to return an error. But for both of those client requests, there was no colon in the first 5 characters. For a server to robustly distinguish between the two, we have to read the entire request and search for a colon, to decide whether the missing colon warrants an error reply. But a client can request a name up to 4k in length (the NBD maximum string) - so taking the argument to an extreme, we have to manage a 4k string before making our decision. Reading 5 bytes fits easily into the stack, but reading 4k bytes onto the stack risks skipping the guard page on an OS with 4k page sizes, for less-than-stellar handling on stack overflow; and reading one byte at a time to check for colon is a pain compared to just deciding after 5 bytes that the rest of the string is irrelevant and skipping ahead in the data stream to the next point where any useful work might be performed. Thus, I'm arguing that for ease of server implementation, we should permit, but not require, the server to diagnose ill-formed requests that do not begin with namespace-colon; but that we MUST require that the server ignores unknown but well-formed requests rather than treating them as errors. But there is another alternative as well - instead of returning a two-part string where colon is the separator, and where the server must parse to locate the colon (or lack thereof), would it make sense to instead change the NBD_OPT_{LIST,SET}_META_CONTEXT and NBD_REP_META_CONTEXT field layout to have two separate length/strings per context (first is length/string for namespace, second is length/string for leaf name), where colon is no longer special (it is no longer possible for a client to pass an ill-formed request that lacks colon), and where the server can immediately tell that 'namespace_length != 5, therefore this is a request for a namespace I don't care about'? > - These two fields MAY be repeated as much as is necessary to select all > - metadata contexts the client is interested in. > + - 32 bits, length of export name. > + - String, name of export for which we wish to list metadata > + contexts. > + - 32 bits, number of queries > + - Zero or more queries, each being: > + - 32 bits, length of query > + - String, query to select metadata contexts. The syntax of this > + query is implementation-defined, except that it MUST start with a > + namespace and a colon. > + Also, is 32 bits as the length of the query really necessary? Given that NBD strings are capped at 4k, a 16-bit length is sufficient. It's not like we have padding to get things to natural alignments (if it were a matter of always sending 32-bit or 64-bit quantities on natural alignments, we'd have padding in a lot more places).
16.02.2018 15:35, Eric Blake wrote: > Reviving an old thread, to bring up questions based on a new push to > implement this extension in qemu. > > On 12/14/2016 09:08 AM, Alex Bligh wrote: >> >> * Change NBD_OPT_LIST_METADATA etc. to explicitly send a list of queries >> and add a count of queries so we can extend the command later >> (rather than >> rely on the length of option) >> >> * For NBD_OPT_LIST_METADATA make absence of any query (as opposed to >> zero >> length query) list all contexts, as absence of any query is now >> simple. >> >> * Move definition of namespaces in the document to somewhere more >> appopriate. >> >> * Various other minor changes as discussed on the mailing list >> >> Signed-off-by: Alex Bligh <alex@alex.org.uk> >> --- >> doc/proto.md | 179 >> +++++++++++++++++++++++++++++++++++++---------------------- >> 1 file changed, 112 insertions(+), 67 deletions(-) >> > >> +Metadata contexts are identified by their names. The name MUST >> +consist of a namespace, followed by a colon, followed by a leaf-name. >> +The namespace and the leaf-name must each consist entirely of >> +printable non-whitespace UTF-8 characters other than colons, >> +and be non-empty. The entire name (namespace, colon and leaf-name) >> +MUST NOT exceed 255 bytes (and therefore botht he namespace and >> +leaf-name are guaranteed to be smaller than 255 bytes). >> + >> +Namespaces MUST be consist of one of the following: >> +- `base`, for metadata contexts defined by this document; >> +- `nbd-server`, for metadata contexts defined by the >> + implementation that accompanies this document (none >> + currently); >> +- `x-*`, where `*` can be replaced by any random string not > > We're inconsistent on whether extensions are 'x-' or 'X-'; we should > tighten that up. (No real impact to implementing things - a server > should just ignore namespaces it doesn't recognize, regardless of how > that unknown namespace was spelled). personally I'm for x- like in qmp, it's just nicer . However I'm sure, regardless of the choice, people will avoid both variant fora production names) > >> + containing colons, for local experiments. This SHOULD NOT be >> + used by metadata contexts that are expected to be widely used. >> +- A third-party namespace from the list below. > >> @@ -932,51 +961,58 @@ of the newstyle negotiation. >> + If zero queries are sent, then the server MUST return all >> + the metadata contexts it knows about. >> + >> + For details on the query string, see under >> `NBD_OPT_SET_META_CONTEXT`. >> + >> + The server MUST either reply with an error (for instance `EINVAL` >> + if the option is not supported), or reply with a list of >> + `NBD_REP_META_CONTEXT` replies followed by `NBD_REP_ACK`. >> + The metadata context ID in these replies is reserved and SHOULD be >> + set to zero; clients MUST disregard it. > > The question came up whether the server is required/permitted to > diagnose bogus queries (a query for "bad" has no colon, and therefore > cannot represent any namespace - is that required to be an error, or > can the server silently ignore it and process the rest of the > requests? Can the client rely on the server diagnosing bad requests, > or must it be prepared for the server to just ignore bad requests?). What do you mean by 'ignore'? Replying with empty list (if only one invalid query)? Hmm. I'm ok with either way. > > Here's my thinking: > > A server implementation may want to vet only the first 5 characters of > a request (because it only supports the namespace "base:"). If that > server encounters a client asks for context "X-longname:", that is a > valid request (so we should NOT reply with an error, but merely ignore > it as an unknown namespace); but if a client asks for namespace > "garbage", we have the option of whether to return an error. But for > both of those client requests, there was no colon in the first 5 > characters. For a server to robustly distinguish between the two, we > have to read the entire request and search for a colon, to decide > whether the missing colon warrants an error reply. But a client can > request a name up to 4k in length (the NBD maximum string) - so taking > the argument to an extreme, we have to manage a 4k string before > making our decision. Reading 5 bytes fits easily into the stack, but > reading 4k bytes onto the stack risks skipping the guard page on an OS > with 4k page sizes, for less-than-stellar handling on stack overflow; > and reading one byte at a time to check for colon is a pain compared > to just deciding after 5 bytes that the rest of the string is > irrelevant and skipping ahead in the data stream to the next point > where any useful work might be performed. Thus, I'm arguing that for > ease of server implementation, we should permit, but not require, the > server to diagnose ill-formed requests that do not begin with > namespace-colon; but that we MUST require that the server ignores > unknown but well-formed requests rather than treating them as errors. Completely agree. > > But there is another alternative as well - instead of returning a > two-part string where colon is the separator, and where the server > must parse to locate the colon (or lack thereof), would it make sense > to instead change the NBD_OPT_{LIST,SET}_META_CONTEXT and > NBD_REP_META_CONTEXT field layout to have two separate length/strings > per context (first is length/string for namespace, second is > length/string for leaf name), where colon is no longer special (it is > no longer possible for a client to pass an ill-formed request that > lacks colon), and where the server can immediately tell that > 'namespace_length != 5, therefore this is a request for a namespace I > don't care about'? Good idea. But it would be tricky thing to maintain backward compatibility with published versions of virtuozzo product. And finally our implementation would be more complex because of this simplification. > > >> - These two fields MAY be repeated as much as is necessary to >> select all >> - metadata contexts the client is interested in. >> + - 32 bits, length of export name. >> + - String, name of export for which we wish to list metadata >> + contexts. >> + - 32 bits, number of queries >> + - Zero or more queries, each being: >> + - 32 bits, length of query >> + - String, query to select metadata contexts. The syntax of this >> + query is implementation-defined, except that it MUST start >> with a >> + namespace and a colon. >> + > > Also, is 32 bits as the length of the query really necessary? Given > that NBD strings are capped at 4k, a 16-bit length is sufficient. > It's not like we have padding to get things to natural alignments (if > it were a matter of always sending 32-bit or 64-bit quantities on > natural alignments, we'd have padding in a lot more places). > Hm. Finally, you suggested several changes (including already merged 56c77720 :( ). Suggestions are logical. But if they will be accepted, we (Virtuozzo) will have to invent tricky hard-to-maintain code, to distinguish by third factors our already published versions. So, I suggest to add some negotiation flag (BLOCK_STATUS_FIXED ?), to negotiate your changes (including 56c77720). In upstream we can skip supporting BLOCK_STATUS without that special flag, but it will help us to distinguish our previous versions reliably.
On 02/16/2018 07:53 AM, Vladimir Sementsov-Ogievskiy wrote: > > Good idea. But it would be tricky thing to maintain backward > compatibility with published versions of virtuozzo product. And finally > our implementation would be more complex because of this simplification. > > > Hm. Finally, you suggested several changes (including already merged > 56c77720 :( ). Suggestions are logical. But if they will be accepted, we > (Virtuozzo) will have to invent tricky hard-to-maintain code, to > distinguish by third factors our already published versions. So, I > suggest to add some negotiation flag (BLOCK_STATUS_FIXED ?), to > negotiate your changes (including 56c77720). In upstream we can skip > supporting BLOCK_STATUS without that special flag, but it will help us > to distinguish our previous versions reliably. So if I understand correctly, you already have an implementation of NBD servers in the wild that support a preliminary version of NBD_OPT_SET_META_CONTEXT, but which do not necessarily obey the semantics in the current extension branch (and which may not be the final semantics by the time we promote the extension branch into the NBD spec overall). Also, the Virtuozzo product hasn't been mentioned on the NBD list, and while you are preparing patches to the public qemu based on the Virtuozzo product, I'm not sure if there is a public repository for the existing Virtuozzo product out in the wild. Is your product using NBD_OPT_LIST_META_CONTEXT, or _just_ NBD_OPT_SET_META_CONTEXT? Now, let's assess the ramifications for changing the current NBD proposal, before finalizing it into mainline. Once it is in mainline, all other implementations have to obey the spec; but we want to leave ourselves the freedom to tweak the extension prior to that point according to things learned while implementing it. With that said, what combinations are you hoping to still support? And how long do you expect to have mismatched versions in the field, or is it something where it is relatively easy to upgrade both client and server within a short timeframe? Here's what things look like if we do nothing, and the NBD_OPT side of the handshake stays compatible (as I understand it, your complaint about commit 56c77720 is that it affects only the transmission side; leaving aside for the moment that I have opened the door for even more changes that may affect the handshake side). old client, old server (what is in the field now): The Virtuozzo client is able to request status from the Virtuozzo server, no other NBD client or server cares because they pre-date block status new client, old server: Any new client that implements block_status, as currently documented, but talks to an existing Virtuozzo server, will get garbage replies to NBD_CMD_BLOCK_STATUS (status 5 instead of expected status 3). Virtuozzo's client could be taught to special-case things and accept BOTH values for status replies (as long as nothing else changes between the old implementation and when NBD finally accepts the extension into mainline) old client, new server: The Virtuozzo client talking to a non-Virtuozzo server that understands block_status as currently documented will get garbage replies to NBD_CMD_BLOCK_STATUS (status 3 instead of expected status 5). The Virtuozzo server cannot be patched to work around this, because it has no way to detect a Virtuozzo client differently from any other client. new client, new server: Because everyone follows the same documentation, it should be interoperable. It's ALWAYS risky to put an experimental item into production use; some precautions such as using intentionally higher numbers than normal may make it easier to identify the experimental versions (that is, if the proposal is to implement item '10', the experimental code implements item '10010'; then, any changes to what actually gets finalized into the semantics for item '10' will not affect continued use of the '10010' semantics). But since you didn't do that, my immediate reaction is that adding a 'BLOCK_STATUS_FIXED' to the NBD protocol sounds crazy (fixed compared to what? An unpublished early implementation?). But what we CAN do is specify that NBD_OPT_ value 10 is reserved for a (withdrawn) experimental extension (similar to how NBD_OPT_ value 4 for PEEK_EXPORT is skipped), and make NBD_OPT_SET_META_CONTEXT be 11. Then you have: old client, old server: client sends NBD_OPT 10, Virtuozzo server replies, and both sides use whatever you implemented even if it differs from the final NBD spec new client, old server: any compliant client sends NBD_OPT 11, Virtuozzo server doesn't recognize it, and you lose the ability to query block status. Additionally, you can teach the Virtuozzo client to try 11 first, and if it fails, fall back to trying 10 - the client then has to understand both old and new flavors, but can now talk to any version of the Virtuozzo server. old client, new server: the Virtuozzo client sends NBD_OPT 10, a compliant server doesn't recognize it, and you don't get to query block status, but you are no worse off than for any other server that doesn't understand block status. Additionally, you can teach the Virtuozzo server to accept opt 10 in addition to 11; the server now has to understand both old and new flavors, but can now talk to any version of the Virtuozzo client. new client, new server: the client sends NBD_OPT 11, the server replies, and they communicate according to the spec Or, we can revert the change in commit 56c77720, and keep NBD_REPLY_TYPE_BLOCK_STATUS at 5 (it leaves a hole in the NBD_REPLY_TYPE numbering, where 3 and 4 might be filled in by other future extensions, or permanently skipped). This works IF there are no OTHER incompatible changes made to the rest of the block status extension as part of promoting it to current (where we still haven't finished that debate, given my question on whether 32-bit lengths and colon-separated namespace:leaf in a single string is the best representation). So, I'd like some feedback from Alex or Wouter on which alternatives seem nicest at this point.
Hi, Sorry, I forgot to reply to this earlier. On Fri, Feb 16, 2018 at 10:10:59AM -0600, Eric Blake wrote: > On 02/16/2018 07:53 AM, Vladimir Sementsov-Ogievskiy wrote: > > Good idea. But it would be tricky thing to maintain backward > > compatibility with published versions of virtuozzo product. And finally > > our implementation would be more complex because of this simplification. > > > Hm. Finally, you suggested several changes (including already merged > > 56c77720 :( ). Suggestions are logical. But if they will be accepted, we > > (Virtuozzo) will have to invent tricky hard-to-maintain code, to > > distinguish by third factors our already published versions. Hrm. I wasn't aware you'd already published those versions; if you had told us, we could've reviewed the spec at that point and either merge it or update it to incorporate whatever changes you think seem like they are necessary. Note that the section "Experimental extensions" contains the following wording: In addition to the normative elements of the specification set out herein, various experimental non-normative extensions have been proposed. These may not be implemented in any known server or client, and are subject to change at any point. A full implementation may require changes to the specifications, or cause the specifications to be withdrawn altogether. [...] Implementors of these extensions are strongly suggested to contact the mailinglist in order to help fine-tune the specifications before committing to a particular implementation. i.e., what's in an "extension-" branch can be described as "we're thinking about it and it will probably happen as written, but we're not entirely sure yet". The idea behind this is that we don't want to limit ourselves to things that haven't been implemented (but that the fact of "implementing something" causes it to get written into stone, sortof). This is different from how most standards bodies work, I'm sure, but it seemed to work so far, and I wish you had let us know that the work you were doing was about to be reaching customers. Anyway. [...] > Also, the Virtuozzo product hasn't been mentioned on the NBD list, and while > you are preparing patches to the public qemu based on the Virtuozzo product, > I'm not sure if there is a public repository for the existing Virtuozzo > product out in the wild. Is your product using NBD_OPT_LIST_META_CONTEXT, > or _just_ NBD_OPT_SET_META_CONTEXT? +1. What's published currently? [...] > Or, we can revert the change in commit 56c77720, and keep > NBD_REPLY_TYPE_BLOCK_STATUS at 5 (it leaves a hole in the NBD_REPLY_TYPE > numbering, where 3 and 4 might be filled in by other future extensions, or > permanently skipped). This works IF there are no OTHER incompatible changes > made to the rest of the block status extension as part of promoting it to > current (where we still haven't finished that debate, given my question on > whether 32-bit lengths and colon-separated namespace:leaf in a single string > is the best representation). > > So, I'd like some feedback from Alex or Wouter on which alternatives seem > nicest at this point. I'm thinking that reverting at least the number change seems like a good idea. If Vladimir can shed some light on what's been published so far, we can see what damage has been done and try to limit further damage. It makes no sense to ignore that the spec has been implemented; the whole point of writing a spec is so that third parties can implement it and be compatible. If we ignore that just because there was a misunderstanding, then I think we're throwing away the kid with the bathwater. Since I don't think a gap in numbers is that much of a problem, I'm happy reverting the renumbering part of 56c77720 and keep it at 5. I'd also like to know what it is exactly that Virtuozzo implemented, so we can update the extension-blockstatus (if necessary) and then merge that to master. However, for future reference, Vladimir, I would prefer it if you could give us a heads-up if you're getting close to releasing something to the public that's still in an experimental branch, so that we can make updates if necessary and merge it to master. Thanks,
On 02/28/2018 07:08 AM, Wouter Verhelst wrote: >> Or, we can revert the change in commit 56c77720, and keep >> NBD_REPLY_TYPE_BLOCK_STATUS at 5 (it leaves a hole in the NBD_REPLY_TYPE >> numbering, where 3 and 4 might be filled in by other future extensions, or >> permanently skipped). This works IF there are no OTHER incompatible changes >> made to the rest of the block status extension as part of promoting it to >> current (where we still haven't finished that debate, given my question on >> whether 32-bit lengths and colon-separated namespace:leaf in a single string >> is the best representation). >> >> So, I'd like some feedback from Alex or Wouter on which alternatives seem >> nicest at this point. > > I'm thinking that reverting at least the number change seems like a good > idea. If Vladimir can shed some light on what's been published so far, > we can see what damage has been done and try to limit further damage. > > It makes no sense to ignore that the spec has been implemented; the > whole point of writing a spec is so that third parties can implement it > and be compatible. If we ignore that just because there was a > misunderstanding, then I think we're throwing away the kid with the > bathwater. > > Since I don't think a gap in numbers is that much of a problem, I'm > happy reverting the renumbering part of 56c77720 and keep it at 5. I'd > also like to know what it is exactly that Virtuozzo implemented, so we > can update the extension-blockstatus (if necessary) and then merge that > to master. Okay, I've gone ahead and reverted the renumbering; NBD_INFO_BLOCK_SIZE is back to 5 (we have a gap in the numbering, but it doesn't hurt). There's still the open question of whether we want to keep a single string that requires parsing into "namespace:leaf" sections, or to split things into two separate strings for ease of handing an unknown namespace; and whether a 16-bit length is sufficient instead of burning 32-bit length fields. But if we decide that any of those changes are worth making, we'd also want to do something else to make it obviously different from Virtuozzo's initial implementation (my idea about reserving NBD_OPT_ value 10 as unused, and renumbering NBD_OPT_SET_META_CONTEXT to 11 along with any other changes we make, is still reasonable for that point). I'd also like to promote extension-structured-reply to current (now that it is implemented by qemu), while keeping extension-blockstatus as a separate branch for just a bit longer. I'll post another thread on other possible doc changes that have come up during this thread.
On 02/28/2018 02:26 PM, Eric Blake wrote: > > Okay, I've gone ahead and reverted the renumbering; NBD_INFO_BLOCK_SIZE > is back to 5 It helps if I don't copy-and-paste the wrong thing. NBD_INFO_BLOCK_SIZE remains 3 (as it has always been), NBD_REPLY_TYPE_BLOCK_STATUS is reverted back to 5 (undoing part of commit 56c77720).
Hi, Sorry too for long delay. 28.02.2018 16:08, Wouter Verhelst wrote: > Hi, > > Sorry, I forgot to reply to this earlier. > > On Fri, Feb 16, 2018 at 10:10:59AM -0600, Eric Blake wrote: >> On 02/16/2018 07:53 AM, Vladimir Sementsov-Ogievskiy wrote: >>> Good idea. But it would be tricky thing to maintain backward >>> compatibility with published versions of virtuozzo product. And finally >>> our implementation would be more complex because of this simplification. >>> Hm. Finally, you suggested several changes (including already merged >>> 56c77720 :( ). Suggestions are logical. But if they will be accepted, we >>> (Virtuozzo) will have to invent tricky hard-to-maintain code, to >>> distinguish by third factors our already published versions. > Hrm. I wasn't aware you'd already published those versions; if you had > told us, we could've reviewed the spec at that point and either merge it > or update it to incorporate whatever changes you think seem like they > are necessary. > > Note that the section "Experimental extensions" contains the following > wording: > > In addition to the normative elements of the specification set out > herein, various experimental non-normative extensions have been > proposed. These may not be implemented in any known server or > client, and are subject to change at any point. A full > implementation may require changes to the specifications, or cause > the specifications to be withdrawn altogether. > > [...] > > Implementors of these extensions are strongly suggested to contact > the mailinglist in order to help fine-tune the specifications before > committing to a particular implementation. > > i.e., what's in an "extension-" branch can be described as "we're > thinking about it and it will probably happen as written, but we're not > entirely sure yet". The idea behind this is that we don't want to limit > ourselves to things that haven't been implemented (but that the fact of > "implementing something" causes it to get written into stone, sortof). > > This is different from how most standards bodies work, I'm sure, but it > seemed to work so far, and I wish you had let us know that the work you > were doing was about to be reaching customers. Anyway. Oh, it's my big mistake, I understand now. Discussion was (and continues to be=) too long, so we couldn't wait, but I of course should have announce it somehow, or use bigger experimental constants for the protocol. > > [...] >> Also, the Virtuozzo product hasn't been mentioned on the NBD list, and while >> you are preparing patches to the public qemu based on the Virtuozzo product, >> I'm not sure if there is a public repository for the existing Virtuozzo >> product out in the wild. Is your product using NBD_OPT_LIST_META_CONTEXT, >> or _just_ NBD_OPT_SET_META_CONTEXT? > +1. What's published currently? No, we don't have public repo. But our implementation is similar (in protocol) with sent patches. My first published implementation was written one year ago, in Feb 2017. I'm afraid I've missed later changes of the spec, will check. [hm, it's not very easy to do..] _LIST_ is implemented in server, but only _SET_ is used in client. ---- checking for changes ---- Comparing 3118b21df (in extension-blockstatus, committed in Jan 2017) with current state of extension-blockstatus, don't see any significant changes for blockstatus, except command number 5->3, which we decided to rollback (thanks for this). > > [...] >> Or, we can revert the change in commit 56c77720, and keep >> NBD_REPLY_TYPE_BLOCK_STATUS at 5 (it leaves a hole in the NBD_REPLY_TYPE >> numbering, where 3 and 4 might be filled in by other future extensions, or >> permanently skipped). This works IF there are no OTHER incompatible changes >> made to the rest of the block status extension as part of promoting it to >> current (where we still haven't finished that debate, given my question on >> whether 32-bit lengths and colon-separated namespace:leaf in a single string >> is the best representation). >> >> So, I'd like some feedback from Alex or Wouter on which alternatives seem >> nicest at this point. > I'm thinking that reverting at least the number change seems like a good > idea. If Vladimir can shed some light on what's been published so far, > we can see what damage has been done and try to limit further damage. > > It makes no sense to ignore that the spec has been implemented; the > whole point of writing a spec is so that third parties can implement it > and be compatible. If we ignore that just because there was a > misunderstanding, then I think we're throwing away the kid with the > bathwater. > > Since I don't think a gap in numbers is that much of a problem, I'm > happy reverting the renumbering part of 56c77720 and keep it at 5. I'd > also like to know what it is exactly that Virtuozzo implemented, so we > can update the extension-blockstatus (if necessary) and then merge that > to master. > > However, for future reference, Vladimir, I would prefer it if you could > give us a heads-up if you're getting close to releasing something to > the public that's still in an experimental branch, so that we can make > updates if necessary and merge it to master. > > Thanks, >
Reviving this discussion, as it still seems useful to incorporate now that BLOCK_STATUS is only recently part of the standard. On 12/14/2016 11:09 AM, Wouter Verhelst wrote: > One thing I've been thinking about that we might want to add: > > There may be cases where a server, in performing the required calls to > be able to handle a BLOCK_STATUS request, will end up with more > information than the client asked; e.g., if the client asked information > in the base:allocation context on an extent at offset X of length Y, > then the server might conceivably do an lseek(SEEK_DATA) and/or > lseek(SEEK_HOLE). The result of that call might be that the file offset > will now point to a location Z, where Z > (X+Y). > > Currently, our spec says "the sum of the *length* fields MUST not be > greater than the overall *length* of request". This means that in > essense, the server then has to throw away the information it has on the > range Z - (X + Y). In case a client was interested in that information, > that seems like a waste. I would therefore like to remove that > requirement, by rephrasing that "sum of the *length* fields" thing into > something along the following lines: > > In case the server returns N extents, the sum of the *length* fields > of the first N-1 extents MUST NOT be greater than the overall *length* > of the request. The final extent MAY exceed the length of the request > if the server has that information anyway as a side effect of looking > up the required information and wishes to share it. > > This would then result in the fact that the "length" field in the > BLOCK_STATUS command would be little more than a hint, since we're > saying that a server can return more data than requested (if it's > available anyway) and less data than requested (if it would be too > resource-intensive to provide all the information). Not sure whether all > that makes much sense anymore, but hey. > > In addition, the combination of a server providing more information than > requested with a "REQ_ONE" flag and a length field of zero could be an > interesting way to enumerate a whole export, too. Essentially, we could > define that as a client saying "I'm interested in what the size of the > extent at offset X is, and what its properties are". > > Thoughts? >
[resend with updated list address and pruning addresses that bounced] Reviving this discussion, as it still seems useful to incorporate now that BLOCK_STATUS is only recently part of the standard. On 12/14/2016 11:09 AM, Wouter Verhelst wrote: > One thing I've been thinking about that we might want to add: > > There may be cases where a server, in performing the required calls to > be able to handle a BLOCK_STATUS request, will end up with more > information than the client asked; e.g., if the client asked information > in the base:allocation context on an extent at offset X of length Y, > then the server might conceivably do an lseek(SEEK_DATA) and/or > lseek(SEEK_HOLE). The result of that call might be that the file offset > will now point to a location Z, where Z > (X+Y). > > Currently, our spec says "the sum of the *length* fields MUST not be > greater than the overall *length* of request". This means that in > essense, the server then has to throw away the information it has on the > range Z - (X + Y). In case a client was interested in that information, > that seems like a waste. I would therefore like to remove that > requirement, by rephrasing that "sum of the *length* fields" thing into > something along the following lines: > > In case the server returns N extents, the sum of the *length* fields > of the first N-1 extents MUST NOT be greater than the overall *length* > of the request. The final extent MAY exceed the length of the request > if the server has that information anyway as a side effect of looking > up the required information and wishes to share it. > > This would then result in the fact that the "length" field in the > BLOCK_STATUS command would be little more than a hint, since we're > saying that a server can return more data than requested (if it's > available anyway) and less data than requested (if it would be too > resource-intensive to provide all the information). Not sure whether all > that makes much sense anymore, but hey. > > In addition, the combination of a server providing more information than > requested with a "REQ_ONE" flag and a length field of zero could be an > interesting way to enumerate a whole export, too. Essentially, we could > define that as a client saying "I'm interested in what the size of the > extent at offset X is, and what its properties are". > > Thoughts? >
On Thu, May 03, 2018 at 12:26:36PM -0500, Eric Blake wrote: > [resend with updated list address and pruning addresses that bounced] > > Reviving this discussion, as it still seems useful to incorporate now that > BLOCK_STATUS is only recently part of the standard. Yeah. I thought we'd incorporated it already at the time, but apparently not. I still think this is a good idea, and that we should do it. > On 12/14/2016 11:09 AM, Wouter Verhelst wrote: > > > One thing I've been thinking about that we might want to add: > > > > There may be cases where a server, in performing the required calls to > > be able to handle a BLOCK_STATUS request, will end up with more > > information than the client asked; e.g., if the client asked information > > in the base:allocation context on an extent at offset X of length Y, > > then the server might conceivably do an lseek(SEEK_DATA) and/or > > lseek(SEEK_HOLE). The result of that call might be that the file offset > > will now point to a location Z, where Z > (X+Y). > > > > Currently, our spec says "the sum of the *length* fields MUST not be > > greater than the overall *length* of request". This means that in > > essense, the server then has to throw away the information it has on the > > range Z - (X + Y). In case a client was interested in that information, > > that seems like a waste. I would therefore like to remove that > > requirement, by rephrasing that "sum of the *length* fields" thing into > > something along the following lines: > > > > In case the server returns N extents, the sum of the *length* fields > > of the first N-1 extents MUST NOT be greater than the overall *length* > > of the request. The final extent MAY exceed the length of the request > > if the server has that information anyway as a side effect of looking > > up the required information and wishes to share it. > > > > This would then result in the fact that the "length" field in the > > BLOCK_STATUS command would be little more than a hint, since we're > > saying that a server can return more data than requested (if it's > > available anyway) and less data than requested (if it would be too > > resource-intensive to provide all the information). Not sure whether all > > that makes much sense anymore, but hey. > > > > In addition, the combination of a server providing more information than > > requested with a "REQ_ONE" flag and a length field of zero could be an > > interesting way to enumerate a whole export, too. Essentially, we could > > define that as a client saying "I'm interested in what the size of the > > extent at offset X is, and what its properties are". > > > > Thoughts? > > > > -- > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3266 > Virtualization: qemu.org | libvirt.org >
diff --git a/doc/proto.md b/doc/proto.md index e03f434..6955c76 100644 --- a/doc/proto.md +++ b/doc/proto.md @@ -683,15 +683,20 @@ by other users, so has been moved out of the "experimental" section. ## Metadata querying -With the availability of sparse storage formats, it is often needed to -query the status of a particular range and read only those blocks of -data that are actually present on the block device. +It is often helpful for the client to be able to query the status +of a range of blocks. The nature of the status that can be +queried is in part implementation dependent. For instance, +the status might represent: -Some storage formats and operations over such formats express a -concept of data dirtiness. Whether the operation is block device -mirroring, incremental block device backup or any other operation with -a concept of data dirtiness, they all share a need to provide a list -of ranges that this particular operation treats as dirty. +* in a sparse storage format, whether the relevant blocks are + actually present on the backing device for the export; or + +* whether the relevant blocks are 'dirty'; some storage formats + and operations over such formats express a concept of data dirtiness. + Whether the operation is block device mirroring, incremental block + device backup or any other operation with a concept of data dirtiness, + they all share a need to provide a list of ranges that this + particular operation treats as dirty. To provide such classes of information, the NBD protocol has a generic framework for querying metadata; however, its use must first be @@ -699,9 +704,11 @@ negotiated, and one or more metadata contexts must be selected. The procedure works as follows: -- First, during negotiation, the client MUST select one or more metadata +- First, during negotiation, if the client wishes to query metadata + during transmission, the client MUST select one or more metadata contexts with the `NBD_OPT_SET_META_CONTEXT` command. If needed, the client - can use `NBD_OPT_LIST_META_CONTEXT` to list contexts. + can use `NBD_OPT_LIST_META_CONTEXT` to list contexts that the server + supports. - During transmission, a client can then indicate interest in metadata for a given region by way of the `NBD_CMD_BLOCK_STATUS` command, where *offset* and *length* indicate the area of interest. The server MUST @@ -715,13 +722,37 @@ The procedure works as follows: of type `NBD_REPLY_TYPE_BLOCK_STATUS`. A client MUST NOT use `NBD_CMD_BLOCK_STATUS` unless it selected a -nonzero number of metadata contexts during negotiation. Servers SHOULD -reply to clients doing so anyway with `EINVAL`. +non-zero number of metadata contexts during negotiation. Servers SHOULD +reply to clients sending `NBD_CMD_BLOCK_STATUS without +selecting metadata contexts with `EINVAL`. -The reply to the `NBD_CMD_BLOCK_STATUS` request MUST be sent by a +The reply to the `NBD_CMD_BLOCK_STATUS` request MUST be sent as a structured reply; this implies that in order to use metadata querying, structured replies MUST be negotiated first. +Metadata contexts are identified by their names. The name MUST +consist of a namespace, followed by a colon, followed by a leaf-name. +The namespace and the leaf-name must each consist entirely of +printable non-whitespace UTF-8 characters other than colons, +and be non-empty. The entire name (namespace, colon and leaf-name) +MUST NOT exceed 255 bytes (and therefore botht he namespace and +leaf-name are guaranteed to be smaller than 255 bytes). + +Namespaces MUST be consist of one of the following: +- `base`, for metadata contexts defined by this document; +- `nbd-server`, for metadata contexts defined by the + implementation that accompanies this document (none + currently); +- `x-*`, where `*` can be replaced by any random string not + containing colons, for local experiments. This SHOULD NOT be + used by metadata contexts that are expected to be widely used. +- A third-party namespace from the list below. + +Third-party implementations can register additional namespaces by simple +request to the mailing-list. The following additional third-party namespaces +are currently registered: +* (none) + This standard defines exactly one metadata context; it is called `base:allocation`, and it provides information on the basic allocation status of extents (that is, whether they are allocated at all in a @@ -918,9 +949,7 @@ of the newstyle negotiation. - `NBD_OPT_LIST_META_CONTEXT` (10) Return a list of `NBD_REP_META_CONTEXT` replies, one per context, - followed by an `NBD_REP_ACK`. If a server replies to such a request - with no error message, clients MAY send NBD_CMD_BLOCK_STATUS - commands during the transmission phase. + followed by an `NBD_REP_ACK`. If the query string is syntactically invalid, the server SHOULD send `NBD_REP_ERR_INVALID`. If the query string is syntactically valid @@ -932,51 +961,58 @@ of the newstyle negotiation. SHOULD send `NBD_REP_ERR_INVALID`. Data: - - 32 bits, length of export name - - String, name of export for which we wish to list or select metadata + - 32 bits, length of export name. + - String, name of export for which we wish to list metadata contexts. - - 32 bits, length of query - - String, query to select a subset of the available metadata - contexts. If this is not specified (i.e., the "length of query" - field is 0 and no query is sent), then the server MUST send all - the metadata contexts it knows about. If specified, this query - string MUST start with a name that uniquely identifies a server - implementation; e.g., the reference implementation that - accompanies this document would support query strings starting - with 'nbd-server:' - - The server MUST reply with a list of `NBD_REP_META_CONTEXT` replies, - followed by `NBD_REP_ACK`. The metadata context ID in these replies - is reserved and SHOULD be set to zero; clients SHOULD disregard it. + - 32 bits, number of queries + - Zero or more queries, each being: + - 32 bits, length of query. + - String, query to list a subset of the available metadata + contexts. + + If zero queries are sent, then the server MUST return all + the metadata contexts it knows about. + + For details on the query string, see under `NBD_OPT_SET_META_CONTEXT`. + + The server MUST either reply with an error (for instance `EINVAL` + if the option is not supported), or reply with a list of + `NBD_REP_META_CONTEXT` replies followed by `NBD_REP_ACK`. + The metadata context ID in these replies is reserved and SHOULD be + set to zero; clients MUST disregard it. - `NBD_OPT_SET_META_CONTEXT` (11) Change the set of active metadata contexts. Issuing this command replaces all previously-set metadata contexts; clients must ensure - that all metadata contexts they're interested in are selected with + that all metadata contexts they are interested in are selected with the final query that they sent. + A client MUST NOT send `NBD_CMD_BLOCK_STATUS` unless + within the negotiation phase it sent `NBD_OPT_SET_META_CONTEXT` + at least once, and the final time it was sent, the server + responded without an error, and returned at least one metadata + context. + Data: - - 32 bits, length of query - - String, query to select metadata contexts. The syntax of this - query is implementation-defined, except that it MUST start with a - namespace. This namespace may be one of the following: - - `base:`, for metadata contexts defined by this document; - - `nbd-server:`, for metadata contexts defined by the - implementation that accompanies this document (none - currently); - - `x-*:`, where `*` can be replaced by any random string not - containing colons, for local experiments. This SHOULD NOT be - used by metadata contexts that are expected to be widely used. - - third-party implementations can register additional - namespaces by simple request to the mailinglist. - - These two fields MAY be repeated as much as is necessary to select all - metadata contexts the client is interested in. + - 32 bits, length of export name. + - String, name of export for which we wish to list metadata + contexts. + - 32 bits, number of queries + - Zero or more queries, each being: + - 32 bits, length of query + - String, query to select metadata contexts. The syntax of this + query is implementation-defined, except that it MUST start with a + namespace and a colon. + + If zero queries are sent, the server MUST select no metadata contexts. The server MUST reply with a number of `NBD_REP_META_CONTEXT` replies, one for each selected metadata context, each with a unique - metadata context ID. It is not an error if a + metadata context ID, followed by `NBD_REP_ACK`. The metadata context + ID is transient and may vary across calls to `NBD_OPT_SET_META_CONTEXT`; + clients MUST therefore treat the ID as an opaque value and not (for + instance) cache it between connections. It is not an error if a `NBD_OPT_SET_META_CONTEXT` option does not select any metadata context, provided the client then does not attempt to issue `NBD_CMD_BLOCK_STATUS` commands. @@ -1071,9 +1107,10 @@ The `base:allocation` metadata context is the basic "allocated at all" metadata context. If an extent is marked with `NBD_STATE_HOLE` at that context, this means that the given extent is not allocated in the backend storage, and that writing to the extent MAY result in the ENOSPC -error. This supports sparse file semantics on the server side. If a -server has only one metadata context (the default), then writing to an -extent which has `NBD_STATE_HOLE` clear MUST NOT fail with ENOSPC. +error. This supports sparse file semantics on the server side. +If a server supports the `base:allocation` metadata context, then writing +to an extent which has `NBD_STATE_HOLE` clear MUST NOT fail with ENOSPC +unless for reasons specified in the definition of another context. It defines the following flags for the flags field: @@ -1082,20 +1119,21 @@ It defines the following flags for the flags field: `ENOSPC` error); if clear, the block is allocated or the server could not otherwise determine its status. Note that the use of `NBD_CMD_TRIM` is related to this status, but that the server MAY - report a hole even where trim has not been requested, and also that a - server MAY report metadata even where a trim has been requested. + report a hole even where `NBD_CMD_TRIM` has not been requested, and + also that a server MAY report that the block is allocated even where + `NBD_CMD_TRIM` has been requested. - `NBD_STATE_ZERO` (bit 1): if set, the block contents read as all zeroes; if clear, the block contents are not known. Note that the use of `NBD_CMD_WRITE_ZEROES` is related to this status, but that the - server MAY report zeroes even where write zeroes has not been + server MAY report zeroes even where `NBD_CMD_WRITE_ZEROES` has not been requested, and also that a server MAY report unknown content even - where write zeroes has been requested. + where `NBD_CMD_WRITE_ZEROES` has been requested. It is not an error for a server to report that a region of the export has both `NBD_STATE_HOLE` set and `NBD_STATE_ZERO` clear. The -contents of such an area is undefined, and may not be stable; -clients who are aware of the existence of such a region SHOULD NOT -read it. +contents of such an area are undefined, and a client +reading such an area should make no assumption as to its contents +or stability. For the `base:allocation` context, the remainder of the flags field is reserved. Servers SHOULD set it to all-zero; clients MUST ignore unknown @@ -1150,9 +1188,10 @@ valid may depend on negotiation during the handshake phase. `EOVERFLOW` error chunk, if the request length is too large. - bit 3, `NBD_CMD_FLAG_REQ_ONE`; valid during `NBD_CMD_BLOCK_STATUS`. If set, the client is interested in only one extent per metadata - context. If this flag is present, the server SHOULD NOT send metadata - on more than one extent in the reply. Clients SHOULD NOT use this flag - on multiple requests for successive regions in the export. + context. If this flag is present, the server MUST NOT send metadata + on more than one extent in the reply. Client implementors should note + that using this flag on multiple contiguous requests is likely to be + inefficient. ##### Structured reply flags @@ -1225,7 +1264,7 @@ interpret the "length" bytes of payload. *length* MUST be 4 + (a positive integer multiple of 8). This reply represents a series of consecutive block descriptors where the sum - of the lengths of the descriptors MUST not be greater than the + of the length fields within the descriptors MUST not be greater than the length of the original request. This chunk type MUST appear exactly once per metadata ID in a structured reply. @@ -1236,14 +1275,15 @@ interpret the "length" bytes of payload. and is followed by a list of one or more descriptors, each with this layout: - * 32 bits, length (unsigned, MUST NOT be zero) + * 32 bits, length of the extent to which the status below + applies (unsigned, MUST be non-zero) * 32 bits, status flags If the client used the `NBD_CMD_FLAG_REQ_ONE` flag in the request, then every reply chunk MUST NOT contain more than one descriptor. Even if the client did not use the `NBD_CMD_FLAG_REQ_ONE` flag in - its request, the server MAY return less descriptors in the reply + its request, the server MAY return fewer descriptors in the reply than would be required to fully specify the whole range of requested information to the client, if looking up the information would be too resource-intensive for the server, so long as at least one @@ -1462,8 +1502,13 @@ The following request types exist: * `NBD_CMD_BLOCK_STATUS` (7) A block status query request. Length and offset define the range of - interest. Clients MUST NOT use this request unless metadata - contexts have been negotiated, which in turn requires the client to + interest. + + A client MUST NOT send `NBD_CMD_BLOCK_STATUS` unless + within the negotiation phase it sent `NBD_OPT_SET_META_CONTEXT` + at least once, and the final time it was sent, the server + responded without an error, and returned at least one metadata + context. This in turn requires the client to first negotiate structured replies. For a successful return, the server MUST use a structured reply, containing at least one chunk of type `NBD_REPLY_TYPE_BLOCK_STATUS`, where the status field of each
(NB: I've already applied this and pushed it) * Change NBD_OPT_LIST_METADATA etc. to explicitly send a list of queries and add a count of queries so we can extend the command later (rather than rely on the length of option) * For NBD_OPT_LIST_METADATA make absence of any query (as opposed to zero length query) list all contexts, as absence of any query is now simple. * Move definition of namespaces in the document to somewhere more appopriate. * Various other minor changes as discussed on the mailing list Signed-off-by: Alex Bligh <alex@alex.org.uk> --- doc/proto.md | 179 +++++++++++++++++++++++++++++++++++++---------------------- 1 file changed, 112 insertions(+), 67 deletions(-)