diff mbox series

ceph-mgr: requests to restful api get blocked sometimes

Message ID CAKQB+fvpwYHV8FEuX7NNCaTm1GNtUqtP3gFE=kX34+2iux2xvg@mail.gmail.com (mailing list archive)
State New, archived
Headers show
Series ceph-mgr: requests to restful api get blocked sometimes | expand

Commit Message

Jerry Lee Oct. 31, 2018, 8:10 a.m. UTC
Hi,

We setup a ceph cluster (v12.2.2) with restful api plugin running, but
sometimes requests got blocked forever without responding.  While
stucking in such condition, we checked the netstat output and it shown
some packets were queued in the Recv-Q:

[~] netstat -tupln
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address
State       PID/Program name
tcp      129      0 192.168.2.1:8003        0.0.0.0:*
LISTEN      1885/ceph-mgr

And a log which may be related to the issue is captured:
2018-10-29 13:43:00.058319 7fcd1891b700  1 mgr[restful] Unknown
request '140518797573648:0'

After digging into the codes, should the requests list be protected by
the requests_lock as the following patch?  A possible condition we
suspect is that a request is done and the restful plugin is notified.
But unfortunately, the request is not appended to the requests list
yet which makes a "Unknown request" log is generated and the
submit_reqeust() function waits forever without acceping new request.



Any idea and feedback are appreciated, thanks.

- Jerry

Comments

Boris Ranto Oct. 31, 2018, 11:52 a.m. UTC | #1
On Wed, Oct 31, 2018 at 9:11 AM Jerry Lee <leisurelysw24@gmail.com> wrote:
>
> Hi,
>

Hi,

> We setup a ceph cluster (v12.2.2) with restful api plugin running, but
> sometimes requests got blocked forever without responding.  While
> stucking in such condition, we checked the netstat output and it shown
> some packets were queued in the Recv-Q:
>
> [~] netstat -tupln
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address
> State       PID/Program name
> tcp      129      0 192.168.2.1:8003        0.0.0.0:*
> LISTEN      1885/ceph-mgr
>
> And a log which may be related to the issue is captured:
> 2018-10-29 13:43:00.058319 7fcd1891b700  1 mgr[restful] Unknown
> request '140518797573648:0'
>
> After digging into the codes, should the requests list be protected by
> the requests_lock as the following patch?  A possible condition we
> suspect is that a request is done and the restful plugin is notified.
> But unfortunately, the request is not appended to the requests list
> yet which makes a "Unknown request" log is generated and the
> submit_reqeust() function waits forever without acceping new request.
>
> diff --git a/src/pybind/mgr/restful/module.py b/src/pybind/mgr/restful/module.py
> index 6ce610b..bbe88ab 100644
> --- a/src/pybind/mgr/restful/module.py
> +++ b/src/pybind/mgr/restful/module.py
> @@ -363,9 +363,10 @@ class Module(MgrModule):
>              if tag == 'seq':
>                  return
>
> -            request = filter(
> -                lambda x: x.is_running(tag),
> -                self.requests)
> +            with self.requests_lock:
> +                request = filter(
> +                        lambda x: x.is_running(tag),
> +                        self.requests)
>
>              if len(request) != 1:
>                  self.log.warn("Unknown request '%s'" % str(tag))
> @@ -596,8 +597,8 @@ class Module(MgrModule):
>
>
>      def submit_request(self, _request, **kwargs):
> -        request = CommandsRequest(_request)
>          with self.requests_lock:
> +            request = CommandsRequest(_request)
>              self.requests.append(request)
>          if kwargs.get('wait', 0):
>              while not request.is_finished():
>
>
> Any idea and feedback are appreciated, thanks.
>

Do you pass ?wait=1 to the URL so that submit_request actually waits
for the request to finish? Otherwise, the call should be non-blocking.

In any case, I suspect you might be right. The request
(CommandsRequest) creation can fire up the notify event early and this
can cause a race condition where the actual request was not yet added
to the self.requests list so it won't be recognized in the notify
function. The patch looks good to me. Just note that the notify
function was modified slightly in current master so that it does not
print the 'Unknown request' warnings.

Have you been able to successfully test your patch? Can you create a PR?

-boris

> - Jerry
Jerry Lee Nov. 1, 2018, 3:28 a.m. UTC | #2
On Wed, 31 Oct 2018 at 19:52, Boris Ranto <branto@redhat.com> wrote:
>
> On Wed, Oct 31, 2018 at 9:11 AM Jerry Lee <leisurelysw24@gmail.com> wrote:
> >
> > Hi,
> >
>
> Hi,
>
> > We setup a ceph cluster (v12.2.2) with restful api plugin running, but
> > sometimes requests got blocked forever without responding.  While
> > stucking in such condition, we checked the netstat output and it shown
> > some packets were queued in the Recv-Q:
> >
> > [~] netstat -tupln
> > Active Internet connections (only servers)
> > Proto Recv-Q Send-Q Local Address           Foreign Address
> > State       PID/Program name
> > tcp      129      0 192.168.2.1:8003        0.0.0.0:*
> > LISTEN      1885/ceph-mgr
> >
> > And a log which may be related to the issue is captured:
> > 2018-10-29 13:43:00.058319 7fcd1891b700  1 mgr[restful] Unknown
> > request '140518797573648:0'
> >
> > After digging into the codes, should the requests list be protected by
> > the requests_lock as the following patch?  A possible condition we
> > suspect is that a request is done and the restful plugin is notified.
> > But unfortunately, the request is not appended to the requests list
> > yet which makes a "Unknown request" log is generated and the
> > submit_reqeust() function waits forever without acceping new request.
> >
> > diff --git a/src/pybind/mgr/restful/module.py b/src/pybind/mgr/restful/module.py
> > index 6ce610b..bbe88ab 100644
> > --- a/src/pybind/mgr/restful/module.py
> > +++ b/src/pybind/mgr/restful/module.py
> > @@ -363,9 +363,10 @@ class Module(MgrModule):
> >              if tag == 'seq':
> >                  return
> >
> > -            request = filter(
> > -                lambda x: x.is_running(tag),
> > -                self.requests)
> > +            with self.requests_lock:
> > +                request = filter(
> > +                        lambda x: x.is_running(tag),
> > +                        self.requests)
> >
> >              if len(request) != 1:
> >                  self.log.warn("Unknown request '%s'" % str(tag))
> > @@ -596,8 +597,8 @@ class Module(MgrModule):
> >
> >
> >      def submit_request(self, _request, **kwargs):
> > -        request = CommandsRequest(_request)
> >          with self.requests_lock:
> > +            request = CommandsRequest(_request)
> >              self.requests.append(request)
> >          if kwargs.get('wait', 0):
> >              while not request.is_finished():
> >
> >
> > Any idea and feedback are appreciated, thanks.
> >
>
> Do you pass ?wait=1 to the URL so that submit_request actually waits
> for the request to finish? Otherwise, the call should be non-blocking.
>

Yes.

> In any case, I suspect you might be right. The request
> (CommandsRequest) creation can fire up the notify event early and this
> can cause a race condition where the actual request was not yet added
> to the self.requests list so it won't be recognized in the notify
> function. The patch looks good to me. Just note that the notify
> function was modified slightly in current master so that it does not
> print the 'Unknown request' warnings.
>
> Have you been able to successfully test your patch? Can you create a PR?

We basically tested the patch already and keep monitoring the cluster
status for several days to make sure whether the same issue occurs or
not.  We can create a PR for this issue.  Thanks for the feedback :)

- Jerry

>
> -boris
>
> > - Jerry
diff mbox series

Patch

diff --git a/src/pybind/mgr/restful/module.py b/src/pybind/mgr/restful/module.py
index 6ce610b..bbe88ab 100644
--- a/src/pybind/mgr/restful/module.py
+++ b/src/pybind/mgr/restful/module.py
@@ -363,9 +363,10 @@  class Module(MgrModule):
             if tag == 'seq':
                 return

-            request = filter(
-                lambda x: x.is_running(tag),
-                self.requests)
+            with self.requests_lock:
+                request = filter(
+                        lambda x: x.is_running(tag),
+                        self.requests)

             if len(request) != 1:
                 self.log.warn("Unknown request '%s'" % str(tag))
@@ -596,8 +597,8 @@  class Module(MgrModule):


     def submit_request(self, _request, **kwargs):
-        request = CommandsRequest(_request)
         with self.requests_lock:
+            request = CommandsRequest(_request)
             self.requests.append(request)
         if kwargs.get('wait', 0):
             while not request.is_finished():