diff mbox series

[RFC,XEN,for-4.13,1/4] libxl: Introduce libxl__ev_child_kill

Message ID 20191025170505.2834957-2-anthony.perard@citrix.com (mailing list archive)
State Superseded
Headers show
Series Fix: libxl workaround, multiple connection to single QMP socket | expand

Commit Message

Anthony PERARD Oct. 25, 2019, 5:05 p.m. UTC
Allow to kill a child and deregister the callback associated with it.

The death isn't immediate will need to be collected later, so the
ev_child machinery register its own callback.

libxl__ev_child_kill() might be called by an AO operation that is
finishing/cleaning up without a chance for libxl to be notified of the
child death (via SIGCHLD). So it is possible that the application
calls libxl_ctx_free() while there are still child around. To avoid
the application getting unexpected SIGCHLD, the libxl__ao responsible
for killing a child will have to wait until it has been properly
reaped.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
---
 tools/libxl/libxl_event.c    |  3 +-
 tools/libxl/libxl_fork.c     | 55 ++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h |  8 ++++++
 3 files changed, 65 insertions(+), 1 deletion(-)

Comments

Ian Jackson Oct. 28, 2019, 11:26 a.m. UTC | #1
Hi.  Thanks.  The code here looks by and large good to me but I think
the docs and maybe the log messages need improvement.

Anthony PERARD writes ("[RFC XEN PATCH for-4.13 1/4] libxl: Introduce libxl__ev_child_kill"):
> Allow to kill a child and deregister the callback associated with it.

Did you read the doc comment above libxl__ev_child_fork, in
libxl_internal.h near line 1160 ?  The user of libxl__ev_child is
already permitted to kill the child.

In this patch are adding a layer to make this more convenient, and in
particular to let a libxl__ev_child user transfer responsibility for
reaping the child from its own application logic into the ao system.

Some more API documentation to make this much more explicit would be
good - ie the main doc comment the facility needs to discuss it:
 | * It is not possible to "deregister" the child death event source
^ this is no longer true after your patch; indeed that's the point.

So perhaps

> +void libxl__ev_child_kill(libxl__ao *ao, libxl__ev_child *ch)

should be called
   libxl__ev_child_reattach_to_ao
or
   libxl__ev_child_kill_deregister
or something, and maybe it should take a signal number ?

> +static void deregistered_child_callback(libxl__egc *egc,
> +                                        libxl__ev_child *ch,
> +                                        pid_t pid,
> +                                        int status)
> +{
> +    ev_child_killed *ck = CONTAINER_OF(ch, *ck, ch);
> +    EGC_GC;
> +
> +    if (status) {
> +        libxl_report_child_exitstatus(CTX, XTL_ERROR,
> +                                      "killed fork (dying as expected)",
> +                                      pid, status);
> +    } else {
> +        LOG(DEBUG, "killed child exit cleanly, unexpected");

I don't think this is entirely unexpected.  Maybe the child was just
exiting at the point where libxl__ev_child_kill was called.

And, please check log the actual whole exit status.  "status" is a
wait status.  We want to know what signal it died from, whether it
core dumped, the exit status, etc.  Probably, you should call
libxl_report_child_exitstatus.

> @@ -1891,7 +1891,8 @@ static bool ao_work_outstanding(libxl__ao *ao)
>       * decrement progress_reports_outstanding, and call
>       * libxl__ao_complete_check_progress_reports.
>       */
> -    return !ao->complete || ao->progress_reports_outstanding;
> +    return !ao->complete || ao->progress_reports_outstanding
> +        || ao->outstanding_killed_child;
>  }

I wonder if this should gain a new debug message.  If the child gets
lost or stuck for some reason, it will otherwise require searching the
past log to find out why the ao doesn't return.

Ian.
Anthony PERARD Oct. 28, 2019, 5:01 p.m. UTC | #2
On Mon, Oct 28, 2019 at 11:26:41AM +0000, Ian Jackson wrote:
> Hi.  Thanks.  The code here looks by and large good to me but I think
> the docs and maybe the log messages need improvement.
> 
> Anthony PERARD writes ("[RFC XEN PATCH for-4.13 1/4] libxl: Introduce libxl__ev_child_kill"):
> > Allow to kill a child and deregister the callback associated with it.
> 
> Did you read the doc comment above libxl__ev_child_fork, in
> libxl_internal.h near line 1160 ?  The user of libxl__ev_child is
> already permitted to kill the child.
> 
> In this patch are adding a layer to make this more convenient, and in
> particular to let a libxl__ev_child user transfer responsibility for
> reaping the child from its own application logic into the ao system.
> 
> Some more API documentation to make this much more explicit would be
> good - ie the main doc comment the facility needs to discuss it:
>  | * It is not possible to "deregister" the child death event source
> ^ this is no longer true after your patch; indeed that's the point.
> 
> So perhaps
> 
> > +void libxl__ev_child_kill(libxl__ao *ao, libxl__ev_child *ch)
> 
> should be called
>    libxl__ev_child_reattach_to_ao
> or
>    libxl__ev_child_kill_deregister
> or something, and maybe it should take a signal number ?

I'll rework the documentation to explain that the AO won't complete
until the child has been reaped. Adding the signal number to the
parameter and renaming the function _kill_derigister sound good.

> > +static void deregistered_child_callback(libxl__egc *egc,
> > +                                        libxl__ev_child *ch,
> > +                                        pid_t pid,
> > +                                        int status)
> > +{
> > +    ev_child_killed *ck = CONTAINER_OF(ch, *ck, ch);
> > +    EGC_GC;
> > +
> > +    if (status) {
> > +        libxl_report_child_exitstatus(CTX, XTL_ERROR,
> > +                                      "killed fork (dying as expected)",
> > +                                      pid, status);
> > +    } else {
> > +        LOG(DEBUG, "killed child exit cleanly, unexpected");
> 
> I don't think this is entirely unexpected.  Maybe the child was just
> exiting at the point where libxl__ev_child_kill was called.
> 
> And, please check log the actual whole exit status.  "status" is a
> wait status.  We want to know what signal it died from, whether it
> core dumped, the exit status, etc.  Probably, you should call
> libxl_report_child_exitstatus.

It does ;-). But I guess I could call libxl_report_child_exitstatus()
unconditionally, so even if status=0.

> > @@ -1891,7 +1891,8 @@ static bool ao_work_outstanding(libxl__ao *ao)
> >       * decrement progress_reports_outstanding, and call
> >       * libxl__ao_complete_check_progress_reports.
> >       */
> > -    return !ao->complete || ao->progress_reports_outstanding;
> > +    return !ao->complete || ao->progress_reports_outstanding
> > +        || ao->outstanding_killed_child;
> >  }
> 
> I wonder if this should gain a new debug message.  If the child gets
> lost or stuck for some reason, it will otherwise require searching the
> past log to find out why the ao doesn't return.

Do you mean adding a debug message in libxl__ev_child_kill_deregister()?
It's probably a good idea.

I'll add:
    LOG(DEBUG, "ao %p: Will wait process [%ld] death", ao, pid);

Or should we also add a debug log in libxl__ao_complete() ?
Ian Jackson Oct. 29, 2019, 2:17 p.m. UTC | #3
Anthony PERARD writes ("Re: [RFC XEN PATCH for-4.13 1/4] libxl: Introduce libxl__ev_child_kill"):
...
> > > +    if (status) {
> > > +        libxl_report_child_exitstatus(CTX, XTL_ERROR,
> > > +                                      "killed fork (dying as expected)",
> > > +                                      pid, status);
> > > +    } else {
> > > +        LOG(DEBUG, "killed child exit cleanly, unexpected");
> > 
> > I don't think this is entirely unexpected.  Maybe the child was just
> > exiting at the point where libxl__ev_child_kill was called.
> > 
> > And, please check log the actual whole exit status.  "status" is a
> > wait status.  We want to know what signal it died from, whether it
> > core dumped, the exit status, etc.  Probably, you should call
> > libxl_report_child_exitstatus.
> 
> It does ;-).

Oh.  Err.  Yes.

>   But I guess I could call libxl_report_child_exitstatus()
> unconditionally, so even if status=0.

I think that would be fine.  I'm not sure this code knows what set of
exit statuses are plausible so I think calling all kinds of exit
`expected' seems best to me.

> > > @@ -1891,7 +1891,8 @@ static bool ao_work_outstanding(libxl__ao *ao)
> > >       * decrement progress_reports_outstanding, and call
> > >       * libxl__ao_complete_check_progress_reports.
> > >       */
> > > -    return !ao->complete || ao->progress_reports_outstanding;
> > > +    return !ao->complete || ao->progress_reports_outstanding
> > > +        || ao->outstanding_killed_child;
> > >  }
> > 
> > I wonder if this should gain a new debug message.  If the child gets
> > lost or stuck for some reason, it will otherwise require searching the
> > past log to find out why the ao doesn't return.
> 
> Do you mean adding a debug message in libxl__ev_child_kill_deregister()?
> It's probably a good idea.
...
> Or should we also add a debug log in libxl__ao_complete() ?

The latter, yes.  Because that happens at the point where the AO is
otherwise complete.  So if one is reading the log to try to find out
why the thing hasn't completed the debug log will actually say
something about it where you're looking, rather than miles away
somewhere in the scroll.

Thanks,
Ian.
diff mbox series

Patch

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 0370b6acdd1c..f57c16da1fd9 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -1891,7 +1891,8 @@  static bool ao_work_outstanding(libxl__ao *ao)
      * decrement progress_reports_outstanding, and call
      * libxl__ao_complete_check_progress_reports.
      */
-    return !ao->complete || ao->progress_reports_outstanding;
+    return !ao->complete || ao->progress_reports_outstanding
+        || ao->outstanding_killed_child;
 }
 
 void libxl__ao_complete_check_progress_reports(libxl__egc *egc, libxl__ao *ao)
diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c
index eea3d5d4e68e..d99d40107f71 100644
--- a/tools/libxl/libxl_fork.c
+++ b/tools/libxl/libxl_fork.c
@@ -678,6 +678,61 @@  int libxl__ev_child_xenstore_reopen(libxl__gc *gc, const char *what) {
     return rc;
 }
 
+typedef struct ev_child_killed {
+    libxl__ao *ao;
+    libxl__ev_child ch;
+} ev_child_killed;
+static void deregistered_child_callback(libxl__egc *, libxl__ev_child *,
+                                        pid_t, int status);
+
+/*
+ * Allow to send a SIGKILL signal to a child and deregister the death
+ * callback.
+ * state: Active/Idle -> Idle
+ */
+void libxl__ev_child_kill(libxl__ao *ao, libxl__ev_child *ch)
+{
+    AO_GC;
+
+    if (!libxl__ev_child_inuse(ch))
+        return;
+
+    pid_t pid = ch->pid;
+
+    ev_child_killed *new_ch = GCNEW(new_ch);
+    new_ch->ao = ao;
+    new_ch->ch.pid = pid;
+    new_ch->ch.callback = deregistered_child_callback;
+    LIBXL_LIST_INSERT_HEAD(&CTX->children, &new_ch->ch, entry);
+    ao->outstanding_killed_child++;
+
+    LIBXL_LIST_REMOVE(ch, entry);
+    ch->pid = -1;
+    int r = kill(pid, SIGKILL);
+    if (r)
+        LOGE(ERROR, "failed to kill child [%ld]",
+             (unsigned long)pid);
+}
+
+static void deregistered_child_callback(libxl__egc *egc,
+                                        libxl__ev_child *ch,
+                                        pid_t pid,
+                                        int status)
+{
+    ev_child_killed *ck = CONTAINER_OF(ch, *ck, ch);
+    EGC_GC;
+
+    if (status) {
+        libxl_report_child_exitstatus(CTX, XTL_ERROR,
+                                      "killed fork (dying as expected)",
+                                      pid, status);
+    } else {
+        LOG(DEBUG, "killed child exit cleanly, unexpected");
+    }
+    ck->ao->outstanding_killed_child--;
+    libxl__ao_complete_check_progress_reports(egc, ck->ao);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index d2d5af746b50..5823890703ad 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -727,6 +727,7 @@  struct libxl__ao {
     libxl__poller *poller;
     uint32_t domid;
     LIBXL_TAILQ_ENTRY(libxl__ao) entry_for_callback;
+    int outstanding_killed_child;
 };
 
 #define LIBXL_INIT_GC(gc,ctx) do{               \
@@ -1168,6 +1169,13 @@  static inline int libxl__ev_child_inuse(const libxl__ev_child *childw_out)
  * message>". */
 _hidden int libxl__ev_child_xenstore_reopen(libxl__gc *gc, const char *what);
 
+/*
+ * Allow to send a SIGKILL signal to a child and deregister the death
+ * callback.
+ * state: Active/Idle -> Idle
+ */
+_hidden void libxl__ev_child_kill(libxl__ao *ao, libxl__ev_child *ch);
+
 
 /*
  * Other event-handling support provided by the libxl event core to