Message ID | 1477374761-25962-2-git-send-email-jgross@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 25.10.16 at 07:52, <JGross@suse.com> wrote: > --- a/tools/xenstore/xenstored_transaction.c > +++ b/tools/xenstore/xenstored_transaction.c > @@ -79,6 +79,11 @@ struct transaction > > /* List of changed domains - to record the changed domain entry number */ > struct list_head changed_domains; > + > + /* Temporary buffer for XS_DIRECTORY_PART. */ > + char *dirpart_buf; > + unsigned buf_off; > + unsigned buf_len; > }; The buffer you allocate for this can - by nature of this change - be huge, and there's one per transaction. Isn't this causing a resource utilization issue? > @@ -280,6 +285,58 @@ void conn_delete_all_transactions(struct connection *conn) > conn->transaction_started = 0; > } > > +void send_directory_part(struct connection *conn, struct buffered_data *in) > +{ > + struct transaction *trans = conn->transaction; > + struct node *node; > + const char *name = onearg(in); > + unsigned len; > + > + if (name == NULL || trans == NULL) { > + send_error(conn, EINVAL); > + return; > + } > + > + if (name[0] == '@' && name[1] == 0) { > + if (trans->dirpart_buf == NULL) { > + send_error(conn, EINVAL); > + return; > + } > + } else { > + if (trans->dirpart_buf) { > + talloc_free(trans->dirpart_buf); > + trans->dirpart_buf = NULL; > + } > + > + name = canonicalize(conn, name); > + node = get_node(conn, in, name, XS_PERM_READ); > + if (!node) { > + send_error(conn, errno); > + return; > + } > + trans->dirpart_buf = talloc_array(trans, char, > + node->childlen + 1); Once again, especially considering the buffer possibly being huge, shouldn't you check for allocation failure here? > + memcpy(trans->dirpart_buf, node->children, node->childlen); > + trans->dirpart_buf[node->childlen] = 0; > + trans->buf_off = 0; > + trans->buf_len = node->childlen + 1; > + } > + > + if (trans->buf_len - trans->buf_off > 1024) What has this 1024 been derived from? In particular, why is it not (close to) the 4096 limit mentioned in the description? > + len = strlen(trans->dirpart_buf + trans->buf_off) + 1; Looking at construct_node() I'm getting the impression that there are embedded nul characters in the ->children array, in which case this strlen() would likely chop off values which could still fit, requiring needlessly many iterations. And the explicit nul termination after the allocation above would then also appear to be unnecessary. And then - why did you put this function here instead of in xenstored_core.c, e.g. next to send_directory()? Jan
On 25/10/16 11:06, Jan Beulich wrote: >>>> On 25.10.16 at 07:52, <JGross@suse.com> wrote: >> --- a/tools/xenstore/xenstored_transaction.c >> +++ b/tools/xenstore/xenstored_transaction.c >> @@ -79,6 +79,11 @@ struct transaction >> >> /* List of changed domains - to record the changed domain entry number */ >> struct list_head changed_domains; >> + >> + /* Temporary buffer for XS_DIRECTORY_PART. */ >> + char *dirpart_buf; >> + unsigned buf_off; >> + unsigned buf_len; >> }; > > The buffer you allocate for this can - by nature of this change - be > huge, and there's one per transaction. Isn't this causing a resource > utilization issue? It will be allocated only when needed and its size is less than that of the node to be read. >> @@ -280,6 +285,58 @@ void conn_delete_all_transactions(struct connection *conn) >> conn->transaction_started = 0; >> } >> >> +void send_directory_part(struct connection *conn, struct buffered_data *in) >> +{ >> + struct transaction *trans = conn->transaction; >> + struct node *node; >> + const char *name = onearg(in); >> + unsigned len; >> + >> + if (name == NULL || trans == NULL) { >> + send_error(conn, EINVAL); >> + return; >> + } >> + >> + if (name[0] == '@' && name[1] == 0) { >> + if (trans->dirpart_buf == NULL) { >> + send_error(conn, EINVAL); >> + return; >> + } >> + } else { >> + if (trans->dirpart_buf) { >> + talloc_free(trans->dirpart_buf); >> + trans->dirpart_buf = NULL; >> + } >> + >> + name = canonicalize(conn, name); >> + node = get_node(conn, in, name, XS_PERM_READ); >> + if (!node) { >> + send_error(conn, errno); >> + return; >> + } >> + trans->dirpart_buf = talloc_array(trans, char, >> + node->childlen + 1); > > Once again, especially considering the buffer possibly being huge, > shouldn't you check for allocation failure here? I followed coding style from xenstored. Modifying this style should be another patch set IMHO (I'd be fine doing this). > >> + memcpy(trans->dirpart_buf, node->children, node->childlen); >> + trans->dirpart_buf[node->childlen] = 0; >> + trans->buf_off = 0; >> + trans->buf_len = node->childlen + 1; >> + } >> + >> + if (trans->buf_len - trans->buf_off > 1024) > > What has this 1024 been derived from? In particular, why is it not > (close to) the 4096 limit mentioned in the description? In theory a single entry could be up to 2048 bytes long. I wanted to keep the logic simple by not trying to iterate until the limit is reached. I can change that, of course. > >> + len = strlen(trans->dirpart_buf + trans->buf_off) + 1; > > Looking at construct_node() I'm getting the impression that there > are embedded nul characters in the ->children array, in which case > this strlen() would likely chop off values which could still fit, requiring > needlessly many iterations. And the explicit nul termination after the > allocation above would then also appear to be unnecessary. Each children member is a nul terminated string, so I need to keep the nul bytes in the result. And the final nul byte is the indicator for the last entry being reached. > And then - why did you put this function here instead of in > xenstored_core.c, e.g. next to send_directory()? It is using the xenstored_transaction.c private struct transaction. Putting it in xenstored_core.c would have required to move the struct definition into a header file. Juergen
>>> On 25.10.16 at 13:41, <JGross@suse.com> wrote: > On 25/10/16 11:06, Jan Beulich wrote: >>>>> On 25.10.16 at 07:52, <JGross@suse.com> wrote: >>> --- a/tools/xenstore/xenstored_transaction.c >>> +++ b/tools/xenstore/xenstored_transaction.c >>> @@ -79,6 +79,11 @@ struct transaction >>> >>> /* List of changed domains - to record the changed domain entry number */ >>> struct list_head changed_domains; >>> + >>> + /* Temporary buffer for XS_DIRECTORY_PART. */ >>> + char *dirpart_buf; >>> + unsigned buf_off; >>> + unsigned buf_len; >>> }; >> >> The buffer you allocate for this can - by nature of this change - be >> huge, and there's one per transaction. Isn't this causing a resource >> utilization issue? > > It will be allocated only when needed and its size is less than that of > the node to be read. That's not addressing my concern. A DomU could have multiple transactions each with such an operation in progress. And the space here doesn't get accounted against any of the quota. >>> @@ -280,6 +285,58 @@ void conn_delete_all_transactions(struct connection *conn) >>> conn->transaction_started = 0; >>> } >>> >>> +void send_directory_part(struct connection *conn, struct buffered_data *in) >>> +{ >>> + struct transaction *trans = conn->transaction; >>> + struct node *node; >>> + const char *name = onearg(in); >>> + unsigned len; >>> + >>> + if (name == NULL || trans == NULL) { >>> + send_error(conn, EINVAL); >>> + return; >>> + } >>> + >>> + if (name[0] == '@' && name[1] == 0) { >>> + if (trans->dirpart_buf == NULL) { >>> + send_error(conn, EINVAL); >>> + return; >>> + } >>> + } else { >>> + if (trans->dirpart_buf) { >>> + talloc_free(trans->dirpart_buf); >>> + trans->dirpart_buf = NULL; >>> + } >>> + >>> + name = canonicalize(conn, name); >>> + node = get_node(conn, in, name, XS_PERM_READ); >>> + if (!node) { >>> + send_error(conn, errno); >>> + return; >>> + } >>> + trans->dirpart_buf = talloc_array(trans, char, >>> + node->childlen + 1); >> >> Once again, especially considering the buffer possibly being huge, >> shouldn't you check for allocation failure here? > > I followed coding style from xenstored. Modifying this style should be > another patch set IMHO (I'd be fine doing this). Cloning buggy code is never a good idea imo. >>> + memcpy(trans->dirpart_buf, node->children, node->childlen); >>> + trans->dirpart_buf[node->childlen] = 0; >>> + trans->buf_off = 0; >>> + trans->buf_len = node->childlen + 1; >>> + } >>> + >>> + if (trans->buf_len - trans->buf_off > 1024) >> >> What has this 1024 been derived from? In particular, why is it not >> (close to) the 4096 limit mentioned in the description? > > In theory a single entry could be up to 2048 bytes long. I wanted to > keep the logic simple by not trying to iterate until the limit is > reached. I can change that, of course. While I think that would be worthwhile, I don't think this addresses my comment: You don't really explain where the 1024 is coming from. Instead to me your response looks to be related to the following point. >>> + len = strlen(trans->dirpart_buf + trans->buf_off) + 1; >> >> Looking at construct_node() I'm getting the impression that there >> are embedded nul characters in the ->children array, in which case >> this strlen() would likely chop off values which could still fit, requiring >> needlessly many iterations. And the explicit nul termination after the >> allocation above would then also appear to be unnecessary. > > Each children member is a nul terminated string, so I need to keep > the nul bytes in the result. And the final nul byte is the indicator > for the last entry being reached. Right, but this means you transfer only a single entry here. That was the point of my original concern. >> And then - why did you put this function here instead of in >> xenstored_core.c, e.g. next to send_directory()? > > It is using the xenstored_transaction.c private struct transaction. > Putting it in xenstored_core.c would have required to move the > struct definition into a header file. Yeah, I've realized this after having sent the reply, but then again this also relates to the first question above (whether this should be a per-transaction item in the first place). In particular I don't think this being per transaction helps with allowing the guest to (easily) run multiple such queries in parallel, as commonly simple operations get run with null transactions. Jan
On 25/10/16 15:20, Jan Beulich wrote: >>>> On 25.10.16 at 13:41, <JGross@suse.com> wrote: >> On 25/10/16 11:06, Jan Beulich wrote: >>>>>> On 25.10.16 at 07:52, <JGross@suse.com> wrote: >>>> --- a/tools/xenstore/xenstored_transaction.c >>>> +++ b/tools/xenstore/xenstored_transaction.c >>>> @@ -79,6 +79,11 @@ struct transaction >>>> >>>> /* List of changed domains - to record the changed domain entry number */ >>>> struct list_head changed_domains; >>>> + >>>> + /* Temporary buffer for XS_DIRECTORY_PART. */ >>>> + char *dirpart_buf; >>>> + unsigned buf_off; >>>> + unsigned buf_len; >>>> }; >>> >>> The buffer you allocate for this can - by nature of this change - be >>> huge, and there's one per transaction. Isn't this causing a resource >>> utilization issue? >> >> It will be allocated only when needed and its size is less than that of >> the node to be read. > > That's not addressing my concern. A DomU could have multiple > transactions each with such an operation in progress. And the > space here doesn't get accounted against any of the quota. It is accounted against the maximum number of transactions. Additionally the buffer will be kept only if not all of the data could be transferred in the first iteration of the read. > >>>> @@ -280,6 +285,58 @@ void conn_delete_all_transactions(struct connection *conn) >>>> conn->transaction_started = 0; >>>> } >>>> >>>> +void send_directory_part(struct connection *conn, struct buffered_data *in) >>>> +{ >>>> + struct transaction *trans = conn->transaction; >>>> + struct node *node; >>>> + const char *name = onearg(in); >>>> + unsigned len; >>>> + >>>> + if (name == NULL || trans == NULL) { >>>> + send_error(conn, EINVAL); >>>> + return; >>>> + } >>>> + >>>> + if (name[0] == '@' && name[1] == 0) { >>>> + if (trans->dirpart_buf == NULL) { >>>> + send_error(conn, EINVAL); >>>> + return; >>>> + } >>>> + } else { >>>> + if (trans->dirpart_buf) { >>>> + talloc_free(trans->dirpart_buf); >>>> + trans->dirpart_buf = NULL; >>>> + } >>>> + >>>> + name = canonicalize(conn, name); >>>> + node = get_node(conn, in, name, XS_PERM_READ); >>>> + if (!node) { >>>> + send_error(conn, errno); >>>> + return; >>>> + } >>>> + trans->dirpart_buf = talloc_array(trans, char, >>>> + node->childlen + 1); >>> >>> Once again, especially considering the buffer possibly being huge, >>> shouldn't you check for allocation failure here? >> >> I followed coding style from xenstored. Modifying this style should be >> another patch set IMHO (I'd be fine doing this). > > Cloning buggy code is never a good idea imo. > >>>> + memcpy(trans->dirpart_buf, node->children, node->childlen); >>>> + trans->dirpart_buf[node->childlen] = 0; >>>> + trans->buf_off = 0; >>>> + trans->buf_len = node->childlen + 1; >>>> + } >>>> + >>>> + if (trans->buf_len - trans->buf_off > 1024) >>> >>> What has this 1024 been derived from? In particular, why is it not >>> (close to) the 4096 limit mentioned in the description? >> >> In theory a single entry could be up to 2048 bytes long. I wanted to >> keep the logic simple by not trying to iterate until the limit is >> reached. I can change that, of course. > > While I think that would be worthwhile, I don't think this addresses > my comment: You don't really explain where the 1024 is coming from. > Instead to me your response looks to be related to the following > point. The 1024 is some arbitrary choice. I'll change it with the iteration towards the allowed maximum added. > >>>> + len = strlen(trans->dirpart_buf + trans->buf_off) + 1; >>> >>> Looking at construct_node() I'm getting the impression that there >>> are embedded nul characters in the ->children array, in which case >>> this strlen() would likely chop off values which could still fit, requiring >>> needlessly many iterations. And the explicit nul termination after the >>> allocation above would then also appear to be unnecessary. >> >> Each children member is a nul terminated string, so I need to keep >> the nul bytes in the result. And the final nul byte is the indicator >> for the last entry being reached. > > Right, but this means you transfer only a single entry here. That was > the point of my original concern. And only now I realize that you are right. I meant to transfer at least 1024 bytes but didn't do so. I'll correct this. > >>> And then - why did you put this function here instead of in >>> xenstored_core.c, e.g. next to send_directory()? >> >> It is using the xenstored_transaction.c private struct transaction. >> Putting it in xenstored_core.c would have required to move the >> struct definition into a header file. > > Yeah, I've realized this after having sent the reply, but then again > this also relates to the first question above (whether this should be > a per-transaction item in the first place). In particular I don't think > this being per transaction helps with allowing the guest to (easily) > run multiple such queries in parallel, as commonly simple operations > get run with null transactions. The XS_DIRECTORY_PART command is valid in a transaction only. This is a prerequisite to be able to split the operation into multiple pieces while keeping the consistency. Juergen
>>> On 25.10.16 at 15:47, <JGross@suse.com> wrote: > On 25/10/16 15:20, Jan Beulich wrote: >>>>> On 25.10.16 at 13:41, <JGross@suse.com> wrote: >>> On 25/10/16 11:06, Jan Beulich wrote: >>>>>>> On 25.10.16 at 07:52, <JGross@suse.com> wrote: >>>>> --- a/tools/xenstore/xenstored_transaction.c >>>>> +++ b/tools/xenstore/xenstored_transaction.c >>>>> @@ -79,6 +79,11 @@ struct transaction >>>>> >>>>> /* List of changed domains - to record the changed domain entry number */ >>>>> struct list_head changed_domains; >>>>> + >>>>> + /* Temporary buffer for XS_DIRECTORY_PART. */ >>>>> + char *dirpart_buf; >>>>> + unsigned buf_off; >>>>> + unsigned buf_len; >>>>> }; >>>> >>>> The buffer you allocate for this can - by nature of this change - be >>>> huge, and there's one per transaction. Isn't this causing a resource >>>> utilization issue? >>> >>> It will be allocated only when needed and its size is less than that of >>> the node to be read. >> >> That's not addressing my concern. A DomU could have multiple >> transactions each with such an operation in progress. And the >> space here doesn't get accounted against any of the quota. > > It is accounted against the maximum number of transactions. > > Additionally the buffer will be kept only if not all of the data > could be transferred in the first iteration of the read. Which nevertheless means an ill behaved guest could (if its quota allow) create a node with sufficiently many children and then start as many transactions as it can, and do a partial directory listing over each of them. Perhaps, considering that you mainly need this for Dom0, the new verb could be restricted to the control domain for now? >>>> And then - why did you put this function here instead of in >>>> xenstored_core.c, e.g. next to send_directory()? >>> >>> It is using the xenstored_transaction.c private struct transaction. >>> Putting it in xenstored_core.c would have required to move the >>> struct definition into a header file. >> >> Yeah, I've realized this after having sent the reply, but then again >> this also relates to the first question above (whether this should be >> a per-transaction item in the first place). In particular I don't think >> this being per transaction helps with allowing the guest to (easily) >> run multiple such queries in parallel, as commonly simple operations >> get run with null transactions. > > The XS_DIRECTORY_PART command is valid in a transaction only. This is > a prerequisite to be able to split the operation into multiple pieces > while keeping the consistency. But I'm sure other models could be come up with, ideally without requiring meaningful amounts of temporary storage. For instance nodes could be versioned, and if the version changed between iterations, the caller could be signaled to start over. And it looks like such a version wouldn't even need to be bumped for child additions, but only for child removals. Jan
On 25/10/16 16:02, Jan Beulich wrote: >>>> On 25.10.16 at 15:47, <JGross@suse.com> wrote: >> On 25/10/16 15:20, Jan Beulich wrote: >>>>>> On 25.10.16 at 13:41, <JGross@suse.com> wrote: >>>> On 25/10/16 11:06, Jan Beulich wrote: >>>>>>>> On 25.10.16 at 07:52, <JGross@suse.com> wrote: >>>>>> --- a/tools/xenstore/xenstored_transaction.c >>>>>> +++ b/tools/xenstore/xenstored_transaction.c >>>>>> @@ -79,6 +79,11 @@ struct transaction >>>>>> >>>>>> /* List of changed domains - to record the changed domain entry number */ >>>>>> struct list_head changed_domains; >>>>>> + >>>>>> + /* Temporary buffer for XS_DIRECTORY_PART. */ >>>>>> + char *dirpart_buf; >>>>>> + unsigned buf_off; >>>>>> + unsigned buf_len; >>>>>> }; >>>>> >>>>> The buffer you allocate for this can - by nature of this change - be >>>>> huge, and there's one per transaction. Isn't this causing a resource >>>>> utilization issue? >>>> >>>> It will be allocated only when needed and its size is less than that of >>>> the node to be read. >>> >>> That's not addressing my concern. A DomU could have multiple >>> transactions each with such an operation in progress. And the >>> space here doesn't get accounted against any of the quota. >> >> It is accounted against the maximum number of transactions. >> >> Additionally the buffer will be kept only if not all of the data >> could be transferred in the first iteration of the read. > > Which nevertheless means an ill behaved guest could (if its quota > allow) create a node with sufficiently many children and then start > as many transactions as it can, and do a partial directory listing > over each of them. Perhaps, considering that you mainly need this > for Dom0, the new verb could be restricted to the control domain > for now? What about driver domains? They might need it, too. Regarding memory usage: current limits for a guest if no quota have been specified are: - 1000 nodes - max. entry size of 2048 bytes - 10 transactions So each node will need max. 4kB (2048 bytes entry, 2048 bytes name), total node memory will be 4MB for this guest. Lets assume the domU puts all those nodes in just one directory and tries to read it in all possible 10 transactions. For each transaction we'll need: - 2MB temp buffer (1000 nodes * 2048 bytes name) - at least 4MB data base copy (each transaction needs a complete copy of the data base, which is one of the main reasons I want to rewrite the transaction handling) So an ill-behaved domU can force xenstored to use at least 64MB of memory. The (minimal) memory amount for n ill-behaved domUs is: n * (24 MB + n * 40MB) n=1: 64MB n=2: 208MB n=3: 432MB n=4: 736MB Without the XS_DIRECTORY_PART support the numbers are rather similar: n * n * 40MB total memory, so e.g. 640MB for 4 evil domUs. > >>>>> And then - why did you put this function here instead of in >>>>> xenstored_core.c, e.g. next to send_directory()? >>>> >>>> It is using the xenstored_transaction.c private struct transaction. >>>> Putting it in xenstored_core.c would have required to move the >>>> struct definition into a header file. >>> >>> Yeah, I've realized this after having sent the reply, but then again >>> this also relates to the first question above (whether this should be >>> a per-transaction item in the first place). In particular I don't think >>> this being per transaction helps with allowing the guest to (easily) >>> run multiple such queries in parallel, as commonly simple operations >>> get run with null transactions. >> >> The XS_DIRECTORY_PART command is valid in a transaction only. This is >> a prerequisite to be able to split the operation into multiple pieces >> while keeping the consistency. > > But I'm sure other models could be come up with, ideally without > requiring meaningful amounts of temporary storage. For instance > nodes could be versioned, and if the version changed between > iterations, the caller could be signaled to start over. And it looks > like such a version wouldn't even need to be bumped for child > additions, but only for child removals. In case nobody objects to that idea I'll look into it. The node version might even help for better transaction support. Juergen
diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c index 3df977b..9667ce5 100644 --- a/tools/xenstore/xenstored_core.c +++ b/tools/xenstore/xenstored_core.c @@ -1327,6 +1327,10 @@ static void process_message(struct connection *conn, struct buffered_data *in) do_reset_watches(conn, in); break; + case XS_DIRECTORY_PART: + send_directory_part(conn, in); + break; + default: eprintf("Client unknown operation %i", in->hdr.msg.type); send_error(conn, ENOSYS); diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c index 34720fa..22a51da 100644 --- a/tools/xenstore/xenstored_transaction.c +++ b/tools/xenstore/xenstored_transaction.c @@ -79,6 +79,11 @@ struct transaction /* List of changed domains - to record the changed domain entry number */ struct list_head changed_domains; + + /* Temporary buffer for XS_DIRECTORY_PART. */ + char *dirpart_buf; + unsigned buf_off; + unsigned buf_len; }; extern int quota_max_transaction; @@ -280,6 +285,58 @@ void conn_delete_all_transactions(struct connection *conn) conn->transaction_started = 0; } +void send_directory_part(struct connection *conn, struct buffered_data *in) +{ + struct transaction *trans = conn->transaction; + struct node *node; + const char *name = onearg(in); + unsigned len; + + if (name == NULL || trans == NULL) { + send_error(conn, EINVAL); + return; + } + + if (name[0] == '@' && name[1] == 0) { + if (trans->dirpart_buf == NULL) { + send_error(conn, EINVAL); + return; + } + } else { + if (trans->dirpart_buf) { + talloc_free(trans->dirpart_buf); + trans->dirpart_buf = NULL; + } + + name = canonicalize(conn, name); + node = get_node(conn, in, name, XS_PERM_READ); + if (!node) { + send_error(conn, errno); + return; + } + trans->dirpart_buf = talloc_array(trans, char, + node->childlen + 1); + memcpy(trans->dirpart_buf, node->children, node->childlen); + trans->dirpart_buf[node->childlen] = 0; + trans->buf_off = 0; + trans->buf_len = node->childlen + 1; + } + + if (trans->buf_len - trans->buf_off > 1024) + len = strlen(trans->dirpart_buf + trans->buf_off) + 1; + else + len = trans->buf_len - trans->buf_off; + + send_reply(conn, XS_DIRECTORY_PART, trans->dirpart_buf + trans->buf_off, + len); + + trans->buf_off += len; + if (trans->buf_off == trans->buf_len) { + talloc_free(trans->dirpart_buf); + trans->dirpart_buf = NULL; + } +} + /* * Local variables: * c-file-style: "linux" diff --git a/tools/xenstore/xenstored_transaction.h b/tools/xenstore/xenstored_transaction.h index 0c868ee..7e9e187 100644 --- a/tools/xenstore/xenstored_transaction.h +++ b/tools/xenstore/xenstored_transaction.h @@ -38,5 +38,6 @@ void add_change_node(struct transaction *trans, const char *node, TDB_CONTEXT *tdb_transaction_context(struct transaction *trans); void conn_delete_all_transactions(struct connection *conn); +void send_directory_part(struct connection *conn, struct buffered_data *in); #endif /* _XENSTORED_TRANSACTION_H */ diff --git a/xen/include/public/io/xs_wire.h b/xen/include/public/io/xs_wire.h index 0a0cdbc..545f916 100644 --- a/xen/include/public/io/xs_wire.h +++ b/xen/include/public/io/xs_wire.h @@ -50,6 +50,7 @@ enum xsd_sockmsg_type XS_SET_TARGET, XS_RESTRICT, XS_RESET_WATCHES, + XS_DIRECTORY_PART, XS_INVALID = 0xffff /* Guaranteed to remain an invalid type */ };
As the payload size for one xenstore wire command is limited to 4096 bytes it is impossible to read the children names of a node with a large number of children (e.g. /local/domain in case of a host with more than about 2000 domains). This effectively limits the maximum number of domains a host can support. In order to support such long directory outputs add a new wire command XS_DIRECTORY_PART which will return only some entries in each call and can be called in a loop to get all entries. For this to work reliably the loop using XS_DIRECTORY_PART until no further entries are returned must be in one transaction. Using XS_DIRECTORY_PART with a valid path will start the transmission by copying the list of children to a xenstored internal buffer linked to the transaction. Further calls of XS_DIRECTORY_PART with a path "@" will advance in the buffer. The end of the output is indicated by an empty child name. The internal buffer is released when: - the buffer is exhausted - XS_DIRECTORY or XS_DIRECTORY_PART with a valid path is called (this will allocate a new buffer, of course) - the transaction is terminated (explicit or implicit termination) The number of entries returned for each call is implementation specific. The only guarantee is that no call will exceed the limit of 4096 bytes returned. Signed-off-by: Juergen Gross <jgross@suse.com> --- tools/xenstore/xenstored_core.c | 4 +++ tools/xenstore/xenstored_transaction.c | 57 ++++++++++++++++++++++++++++++++++ tools/xenstore/xenstored_transaction.h | 1 + xen/include/public/io/xs_wire.h | 1 + 4 files changed, 63 insertions(+)