diff mbox series

[v1,2/2] oxenstored: make Quota.t pure

Message ID f98edc633527b6d9a6855af0aff4fb77970454cc.1706697858.git.edwin.torok@cloud.com (mailing list archive)
State New, archived
Headers show
Series reduce oxenstored quota processing overhead under load | expand

Commit Message

Edwin Torok Jan. 31, 2024, 10:52 a.m. UTC
Now that we no longer have a hashtable inside we can make Quota.t pure,
and push the mutable update to its callers.
Store.t already had a mutable Quota.t field.

No functional change.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
---
 tools/ocaml/xenstored/quota.ml |  8 ++++----
 tools/ocaml/xenstored/store.ml | 17 ++++++++++-------
 2 files changed, 14 insertions(+), 11 deletions(-)

Comments

Christian Lindig Jan. 31, 2024, 11:17 a.m. UTC | #1
> On 31 Jan 2024, at 10:52, Edwin Török <edwin.torok@cloud.com> wrote:
> 
> Now that we no longer have a hashtable inside we can make Quota.t pure,
> and push the mutable update to its callers.
> Store.t already had a mutable Quota.t field.
> 
> No functional change.

Acked-by: Christian Lindig <christian.lindig@cloud.com>

This is shifting copying working to GC work, at least potentially. I would agree that this is a good trade-off and the code looks correct to me. But I think we should see more testing and benchmarking before merging this unless we are fine merging speculative improvements.

— C
Edwin Torok Jan. 31, 2024, 4:27 p.m. UTC | #2
> On 31 Jan 2024, at 11:17, Christian Lindig <christian.lindig@cloud.com> wrote:
> 
> 
> 
>> On 31 Jan 2024, at 10:52, Edwin Török <edwin.torok@cloud.com> wrote:
>> 
>> Now that we no longer have a hashtable inside we can make Quota.t pure,
>> and push the mutable update to its callers.
>> Store.t already had a mutable Quota.t field.
>> 
>> No functional change.
> 
> Acked-by: Christian Lindig <christian.lindig@cloud.com>
> 
> This is shifting copying working to GC work, at least potentially. I would agree that this is a good trade-off and the code looks correct to me. But I think we should see more testing and benchmarking before merging this unless we are fine merging speculative improvements.
> 
> — C
> 
> 


I’ve written this [1] microbenchmark which creates ~300_000 entries in xenstore, assigns quota to 1000 domains and then measure how long it takes to list xenstore
(It doesn’t matter whether these domains exist or not).
It shows a measurable improvement with this patch, once I’ve run a more realistic test on the original machine with 1000 real VMs I’ll post those results too:

On an Intel Xeon Gold 6354 @ 3.0 Ghz:
* without my patch: 12.756 +- 0.152 seconds time elapsed  ( +-  1.19% )
* with my patch: 5.6051 +- 0.0467 seconds time elapsed  ( +-  0.83% )

[1]
# cat >create.py <<EOF
#!/usr/bin/env python3
from xen.lowlevel.xs import xs

xenstore = xs()

for i in range(1,1000):
  txn = xenstore.transaction_start()
  for j in range(1,10):
    for k in range(1,30):
        path=f"/quotatest/{i}/{j}/{k}/x"
        xenstore.write(txn, path, "val")
        # assign quota to domid i
        xenstore.set_permissions(txn, path, [{"dom": i}])
  xenstore.transaction_end(txn)
EOF
# python3 create.py
# perf stat -r 10 sh -c 'xenstore-ls $(for i in $(seq 1 100); do echo "/quotatest/$i"; done) >/dev/null’

Best regards,
—Edwin
Edwin Torok Feb. 23, 2024, 11:35 a.m. UTC | #3
> On 31 Jan 2024, at 16:27, Edwin Torok <edwin.torok@cloud.com> wrote:
> 
> 
> 
>> On 31 Jan 2024, at 11:17, Christian Lindig <christian.lindig@cloud.com> wrote:
>> 
>> 
>> 
>>> On 31 Jan 2024, at 10:52, Edwin Török <edwin.torok@cloud.com> wrote:
>>> 
>>> Now that we no longer have a hashtable inside we can make Quota.t pure,
>>> and push the mutable update to its callers.
>>> Store.t already had a mutable Quota.t field.
>>> 
>>> No functional change.
>> 
>> Acked-by: Christian Lindig <christian.lindig@cloud.com>
>> 
>> This is shifting copying working to GC work, at least potentially. I would agree that this is a good trade-off and the code looks correct to me. But I think we should see more testing and benchmarking before merging this unless we are fine merging speculative improvements.
>> 
>> — C
>> 
>> 
> 
> 
> I’ve written this [1] microbenchmark which creates ~300_000 entries in xenstore, assigns quota to 1000 domains and then measure how long it takes to list xenstore
> (It doesn’t matter whether these domains exist or not).
> It shows a measurable improvement with this patch, once I’ve run a more realistic test on the original machine with 1000 real VMs I’ll post those results too:

The machine that can run this test is offline now due to a lab move, but I managed to get this data before it went away, and I think this patch series is ready to be committed.

Flamegraph without my changes, where Hashtbl.copy takes up a significant amount of oxenstored time: https://cdn.jsdelivr.net/gh/edwintorok/xen@oxenstored-coverletter/docs/original.svg?x=153.0&y=1269
Flamegraph with this patch series, where Hashtbl.copy does not show up at all: https://cdn.jsdelivr.net/gh/edwintorok/xen@oxenstored-coverletter/docs/oxenstored_no_hashtbl_copy.svg?x=294.3&y=1301
(There are of course still hashtbl in the flame graph, due to the well-known inefficient poll_select implementation, and we see hashtbl iteration as a parent caller, which is fine)

IMHO this means the patch series is a worthwhile improvement: it removes a codepath that was previously a hotspot completely from oxenstored.

The timings on the test also show improvements with this patch:

Booting 575 VMs:
* without this patch series: 1099s
* with this patch series: 604s

Booting 626 VMs:
* without this patch series: 4027s
* with this patch series: 2115s

Booting 627 VMs:
* without this patch series: 4038s
* with this patch series: 4120s

This shows that *with* the patch series the system scales better until it hits a breaking point around 627 VMs where everything is ~equally slow

Not everything is ideal, there is also a slowdown around the 789 booted VM mark:
* without this patch series: 168s
* with this patch series: 394s
I wouldn’t worry about this result, because by this point some VMs have already crashed, and I’d consider the test to have failed by this point. What results you get at this point depends on how much CPU oxenstored gets compared to the qemus ioreq handler that is spinning handling the crash on the serial port.
To actually reach 1000 VMs it will require more fixes outsides of oxenstored (e.g. wrt to using correct groups with tapdisk, qemu, etc., or making qemu better cope with flooding on serial port from the guest), and probably some fixes to the poll_select in oxenstored that I may address in a future patch series.



Best regards,
—Edwin

> 
> On an Intel Xeon Gold 6354 @ 3.0 Ghz:
> * without my patch: 12.756 +- 0.152 seconds time elapsed  ( +-  1.19% )
> * with my patch: 5.6051 +- 0.0467 seconds time elapsed  ( +-  0.83% )
> 
> [1]
> # cat >create.py <<EOF
> #!/usr/bin/env python3
> from xen.lowlevel.xs import xs
> 
> xenstore = xs()
> 
> for i in range(1,1000):
>   txn = xenstore.transaction_start()
>   for j in range(1,10):
>     for k in range(1,30):
>         path=f"/quotatest/{i}/{j}/{k}/x"
>         xenstore.write(txn, path, "val")
>         # assign quota to domid i
>         xenstore.set_permissions(txn, path, [{"dom": i}])
>   xenstore.transaction_end(txn)
> EOF
> # python3 create.py
> # perf stat -r 10 sh -c 'xenstore-ls $(for i in $(seq 1 100); do echo "/quotatest/$i"; done) >/dev/null’
> 
> Best regards,
> —Edwin
Andrew Cooper Feb. 23, 2024, 12:06 p.m. UTC | #4
On 23/02/2024 11:35 am, Edwin Torok wrote:
>> On 31 Jan 2024, at 16:27, Edwin Torok <edwin.torok@cloud.com> wrote:
>>> On 31 Jan 2024, at 11:17, Christian Lindig
>>> <christian.lindig@cloud.com> wrote:
>>>> On 31 Jan 2024, at 10:52, Edwin Török <edwin.torok@cloud.com> wrote:
>>>>
>>>> Now that we no longer have a hashtable inside we can make Quota.t pure,
>>>> and push the mutable update to its callers.
>>>> Store.t already had a mutable Quota.t field.
>>>>
>>>> No functional change.
>>>
>>> Acked-by: Christian Lindig <christian.lindig@cloud.com>
>>>
>>> This is shifting copying working to GC work, at least potentially. I
>>> would agree that this is a good trade-off and the code looks correct
>>> to me. But I think we should see more testing and benchmarking
>>> before merging this unless we are fine merging speculative improvements.
>>>
>>> — C
>>>
>>>
>>
>>
>> I’ve written this [1] microbenchmark which creates ~300_000 entries
>> in xenstore, assigns quota to 1000 domains and then measure how long
>> it takes to list xenstore
>> (It doesn’t matter whether these domains exist or not).
>> It shows a measurable improvement with this patch, once I’ve run a
>> more realistic test on the original machine with 1000 real VMs I’ll
>> post those results too:
>
> The machine that can run this test is offline now due to a lab move,
> but I managed to get this data before it went away, and I think this
> patch series is ready to be committed.
>
> Flamegraph without my changes, where Hashtbl.copy takes up a
> significant amount of oxenstored
> time: https://cdn.jsdelivr.net/gh/edwintorok/xen@oxenstored-coverletter/docs/original.svg?x=153.0&y=1269
> <https://cdn.jsdelivr.net/gh/edwintorok/xen@oxenstored-coverletter/docs/original.svg?x=153.0&y=1269>
> Flamegraph with this patch series, where Hashtbl.copy does not show up
> at
> all: https://cdn.jsdelivr.net/gh/edwintorok/xen@oxenstored-coverletter/docs/oxenstored_no_hashtbl_copy.svg?x=294.3&y=1301
> <https://cdn.jsdelivr.net/gh/edwintorok/xen@oxenstored-coverletter/docs/oxenstored_no_hashtbl_copy.svg?x=294.3&y=1301>
> (There are of course still hashtbl in the flame graph, due to the
> well-known inefficient poll_select implementation, and we see hashtbl
> iteration as a parent caller, which is fine)
>
> IMHO this means the patch series is a worthwhile improvement: it
> removes a codepath that was previously a hotspot completely from
> oxenstored.

Agreed.  I'll queue this series in due course.

~Andrew
diff mbox series

Patch

diff --git a/tools/ocaml/xenstored/quota.ml b/tools/ocaml/xenstored/quota.ml
index f6e28ecc6a..1f652040d8 100644
--- a/tools/ocaml/xenstored/quota.ml
+++ b/tools/ocaml/xenstored/quota.ml
@@ -33,7 +33,7 @@  module DomidMap = Map.Make(Domid)
 type t = {
   maxent: int;               (* max entities per domU *)
   maxsize: int;              (* max size of data store in one node *)
-  mutable cur: int DomidMap.t; (* current domains quota *)
+  cur: int DomidMap.t; (* current domains quota *)
 }
 
 let to_string quota domid =
@@ -76,10 +76,10 @@  let update_entry quota_cur id diff =
   else DomidMap.add id nb quota_cur
 
 let del_entry quota id =
-  quota.cur <- update_entry quota.cur id (-1)
+  {quota with cur = update_entry quota.cur id (-1)}
 
 let add_entry quota id =
-  quota.cur <- update_entry quota.cur id (+1)
+  {quota with cur = update_entry quota.cur id (+1)}
 
 let merge orig_quota mod_quota dest_quota =
   let fold_merge id nb dest =
@@ -87,5 +87,5 @@  let merge orig_quota mod_quota dest_quota =
     | 0 -> dest (* not modified *)
     | diff -> update_entry dest id diff (* update with [x=x+diff] *)
   in
-  dest_quota.cur <- DomidMap.fold fold_merge mod_quota.cur dest_quota.cur
+  {dest_quota with cur = DomidMap.fold fold_merge mod_quota.cur dest_quota.cur}
   (* dest_quota = dest_quota + (mod_quota - orig_quota) *)
diff --git a/tools/ocaml/xenstored/store.ml b/tools/ocaml/xenstored/store.ml
index 38a4945372..9b8dd2812d 100644
--- a/tools/ocaml/xenstored/store.ml
+++ b/tools/ocaml/xenstored/store.ml
@@ -85,7 +85,9 @@  module Node = struct
       raise Define.Permission_denied;
     end
 
-  let rec recurse fct node = fct node; SymbolMap.iter (fun _ -> recurse fct) node.children
+  let rec recurse fct node acc =
+    let acc = fct node acc in
+    SymbolMap.fold (fun _ -> recurse fct) node.children acc
 
   (** [recurse_filter_map f tree] applies [f] on each node in the tree recursively,
       possibly removing some nodes.
@@ -408,7 +410,7 @@  let dump_buffer store = dump_store_buf store.root
 let set_node store path node orig_quota mod_quota =
   let root = Path.set_node store.root path node in
   store.root <- root;
-  Quota.merge orig_quota mod_quota store.quota
+  store.quota <- Quota.merge orig_quota mod_quota store.quota
 
 let write store perm path value =
   let node, existing = get_deepest_existing_node store path in
@@ -422,7 +424,7 @@  let write store perm path value =
   let root, node_created = path_write store perm path value in
   store.root <- root;
   if node_created
-  then Quota.add_entry store.quota owner
+  then store.quota <- Quota.add_entry store.quota owner
 
 let mkdir store perm path =
   let node, existing = get_deepest_existing_node store path in
@@ -431,7 +433,7 @@  let mkdir store perm path =
   if not (existing || (Perms.Connection.is_dom0 perm)) then Quota.check store.quota owner 0;
   store.root <- path_mkdir store perm path;
   if not existing then
-    Quota.add_entry store.quota owner
+    store.quota <- Quota.add_entry store.quota owner
 
 let rm store perm path =
   let rmed_node = Path.get_node store.root path in
@@ -439,7 +441,7 @@  let rm store perm path =
   | None -> raise Define.Doesnt_exist
   | Some rmed_node ->
     store.root <- path_rm store perm path;
-    Node.recurse (fun node -> Quota.del_entry store.quota (Node.get_owner node)) rmed_node
+    store.quota <- Node.recurse (fun node quota -> Quota.del_entry quota (Node.get_owner node)) rmed_node store.quota
 
 let setperms store perm path nperms =
   match Path.get_node store.root path with
@@ -450,8 +452,9 @@  let setperms store perm path nperms =
     if not ((old_owner = new_owner) || (Perms.Connection.is_dom0 perm)) then
       raise Define.Permission_denied;
     store.root <- path_setperms store perm path nperms;
-    Quota.del_entry store.quota old_owner;
-    Quota.add_entry store.quota new_owner
+    store.quota <-
+      let quota = Quota.del_entry store.quota old_owner in
+      Quota.add_entry quota new_owner
 
 let reset_permissions store domid =
   Logging.info "store|node" "Cleaning up xenstore ACLs for domid %d" domid;