diff mbox series

[PULL,5/5] nbd/server: CVE-2024-7409: Close stray clients at server-stop

Message ID 20240808215529.1065336-12-eblake@redhat.com (mailing list archive)
State New, archived
Headers show
Series [PULL,1/5] nbd: Minor style and typo fixes | expand

Commit Message

Eric Blake Aug. 8, 2024, 9:53 p.m. UTC
A malicious client can attempt to connect to an NBD server, and then
intentionally delay progress in the handshake, including if it does
not know the TLS secrets.  Although the previous two patches reduce
this behavior by capping the default max-connections parameter and
killing slow clients, they did not eliminate the possibility of a
client waiting to close the socket until after the QMP nbd-server-stop
command is executed, at which point qemu would SEGV when trying to
dereference the NULL nbd_server global which is no longer present.
This amounts to a denial of service attack.  Worse, if another NBD
server is started before the malicious client disconnects, I cannot
rule out additional adverse effects when the old client interferes
with the connection count of the new server (although the most likely
is a crash due to an assertion failure when checking
nbd_server->connections > 0).

For environments without this patch, the CVE can be mitigated by
ensuring (such as via a firewall) that only trusted clients can
connect to an NBD server.  Note that using frameworks like libvirt
that ensure that TLS is used and that nbd-server-stop is not executed
while any trusted clients are still connected will only help if there
is also no possibility for an untrusted client to open a connection
but then stall on the NBD handshake.

Given the previous patches, it would be possible to guarantee that no
clients remain connected by having nbd-server-stop sleep for longer
than the default handshake deadline before finally freeing the global
nbd_server object, but that could make QMP non-responsive for a long
time.  So intead, this patch fixes the problem by tracking all client
sockets opened while the server is running, and forcefully closing any
such sockets remaining without a completed handshake at the time of
nbd-server-stop, then waiting until the coroutines servicing those
sockets notice the state change.  nbd-server-stop now has a second
AIO_WAIT_WHILE_UNLOCKED (the first is indirectly through the
blk_exp_close_all_type() that disconnects all clients that completed
handshakes), but forced socket shutdown is enough to progress the
coroutines and quickly tear down all clients before the server is
freed, thus finally fixing the CVE.

This patch relies heavily on the fact that nbd/server.c guarantees
that it only calls nbd_blockdev_client_closed() from the main loop
(see the assertion in nbd_client_put() and the hoops used in
nbd_client_put_nonzero() to achieve that); if we did not have that
guarantee, we would also need a mutex protecting our accesses of the
list of connections to survive re-entrancy from independent iothreads.

Although I did not actually try to test old builds, it looks like this
problem has existed since at least commit 862172f45c (v2.12.0, 2017) -
even back when that patch started using a QIONetListener to handle
listening on multiple sockets, nbd_server_free() was already unaware
that the nbd_blockdev_client_closed callback can be reached later by a
client thread that has not completed handshakes (and therefore the
client's socket never got added to the list closed in
nbd_export_close_all), despite that patch intentionally tearing down
the QIONetListener to prevent new clients.

Reported-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Fixes: CVE-2024-7409
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-14-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
---
 blockdev-nbd.c | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

Comments

Michael Tokarev Aug. 11, 2024, 8:02 a.m. UTC | #1
09.08.2024 00:53, Eric Blake wrote:
> A malicious client can attempt to connect to an NBD server, and then
> intentionally delay progress in the handshake, including if it does
> not know the TLS secrets.  Although the previous two patches reduce
> this behavior by capping the default max-connections parameter and
> killing slow clients, they did not eliminate the possibility of a
> client waiting to close the socket until after the QMP nbd-server-stop
> command is executed, at which point qemu would SEGV when trying to
> dereference the NULL nbd_server global which is no longer present.
> This amounts to a denial of service attack.  Worse, if another NBD
> server is started before the malicious client disconnects, I cannot
> rule out additional adverse effects when the old client interferes
> with the connection count of the new server (although the most likely
> is a crash due to an assertion failure when checking
> nbd_server->connections > 0).
> 
> For environments without this patch, the CVE can be mitigated by
> ensuring (such as via a firewall) that only trusted clients can
> connect to an NBD server.  Note that using frameworks like libvirt
> that ensure that TLS is used and that nbd-server-stop is not executed
> while any trusted clients are still connected will only help if there
> is also no possibility for an untrusted client to open a connection
> but then stall on the NBD handshake.
> 
> Given the previous patches, it would be possible to guarantee that no
> clients remain connected by having nbd-server-stop sleep for longer
> than the default handshake deadline before finally freeing the global
> nbd_server object, but that could make QMP non-responsive for a long
> time.  So intead, this patch fixes the problem by tracking all client
> sockets opened while the server is running, and forcefully closing any
> such sockets remaining without a completed handshake at the time of
> nbd-server-stop, then waiting until the coroutines servicing those
> sockets notice the state change.  nbd-server-stop now has a second
> AIO_WAIT_WHILE_UNLOCKED (the first is indirectly through the
> blk_exp_close_all_type() that disconnects all clients that completed
> handshakes), but forced socket shutdown is enough to progress the
> coroutines and quickly tear down all clients before the server is
> freed, thus finally fixing the CVE.
> 
> This patch relies heavily on the fact that nbd/server.c guarantees
> that it only calls nbd_blockdev_client_closed() from the main loop
> (see the assertion in nbd_client_put() and the hoops used in
> nbd_client_put_nonzero() to achieve that); if we did not have that
> guarantee, we would also need a mutex protecting our accesses of the
> list of connections to survive re-entrancy from independent iothreads.
> 
> Although I did not actually try to test old builds, it looks like this
> problem has existed since at least commit 862172f45c (v2.12.0, 2017) -
> even back when that patch started using a QIONetListener to handle
> listening on multiple sockets, nbd_server_free() was already unaware
> that the nbd_blockdev_client_closed callback can be reached later by a
> client thread that has not completed handshakes (and therefore the
> client's socket never got added to the list closed in
> nbd_export_close_all), despite that patch intentionally tearing down
> the QIONetListener to prevent new clients.

Eric, from the 5-patch series, only this last patch is Cc'd for stable,
but it obviously does not work without all 4 previous patches.  Do you
mean whole series should be applied to -stable?

I picked up patches 2-5 for 7.2 and 9.0.

Thanks,

/mjt
Eric Blake Aug. 12, 2024, 2:44 p.m. UTC | #2
On Sun, Aug 11, 2024 at 11:02:52AM GMT, Michael Tokarev wrote:
> 09.08.2024 00:53, Eric Blake wrote:
> > A malicious client can attempt to connect to an NBD server, and then
> > intentionally delay progress in the handshake, including if it does
> > not know the TLS secrets.  Although the previous two patches reduce

> 
> Eric, from the 5-patch series, only this last patch is Cc'd for stable,
> but it obviously does not work without all 4 previous patches.  Do you
> mean whole series should be applied to -stable?
> 
> I picked up patches 2-5 for 7.2 and 9.0.

You are correct that patch 5 in isolation won't work due to missing
pre-reqs, but also that 1 is fluff that doesn't need backporting; my
apologies for not more judiciously adding the cc to all 4 patches
worth the backport effort.  I'm in the middle of efforts to backport
only 2-5 to various RHEL releases, so your choice to do the same for
7.2 and 9.0 matches what I'm doing downstream.
Michael Tokarev Aug. 13, 2024, 6:30 a.m. UTC | #3
[Trim CC list]

12.08.2024 17:44, Eric Blake wrote:
> On Sun, Aug 11, 2024 at 11:02:52AM GMT, Michael Tokarev wrote:
..
>> Eric, from the 5-patch series, only this last patch is Cc'd for stable,
>> but it obviously does not work without all 4 previous patches.  Do you
>> mean whole series should be applied to -stable?
>>
>> I picked up patches 2-5 for 7.2 and 9.0.
> 
> You are correct that patch 5 in isolation won't work due to missing
> pre-reqs, but also that 1 is fluff that doesn't need backporting; my
> apologies for not more judiciously adding the cc to all 4 patches
> worth the backport effort.  I'm in the middle of efforts to backport
> only 2-5 to various RHEL releases, so your choice to do the same for
> 7.2 and 9.0 matches what I'm doing downstream.

That's entirely okay, there's no reason to apologize, - I just wanted
to make sure I got it all correct, nothing more, - it's a purely working
moment.

Speaking of various RHEL etc releases, - maybe we should keep more stable
branches, sort of like I do with 7.2 which is used in Debian?  Or is it
maybe a bad idea?

Thanks,

/mjt
diff mbox series

Patch

diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 24ba5382db0..f73409ae494 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -21,12 +21,18 @@ 
 #include "io/channel-socket.h"
 #include "io/net-listener.h"

+typedef struct NBDConn {
+    QIOChannelSocket *cioc;
+    QLIST_ENTRY(NBDConn) next;
+} NBDConn;
+
 typedef struct NBDServerData {
     QIONetListener *listener;
     QCryptoTLSCreds *tlscreds;
     char *tlsauthz;
     uint32_t max_connections;
     uint32_t connections;
+    QLIST_HEAD(, NBDConn) conns;
 } NBDServerData;

 static NBDServerData *nbd_server;
@@ -51,6 +57,14 @@  int nbd_server_max_connections(void)

 static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
 {
+    NBDConn *conn = nbd_client_owner(client);
+
+    assert(qemu_in_main_thread() && nbd_server);
+
+    object_unref(OBJECT(conn->cioc));
+    QLIST_REMOVE(conn, next);
+    g_free(conn);
+
     nbd_client_put(client);
     assert(nbd_server->connections > 0);
     nbd_server->connections--;
@@ -60,14 +74,20 @@  static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
 static void nbd_accept(QIONetListener *listener, QIOChannelSocket *cioc,
                        gpointer opaque)
 {
+    NBDConn *conn = g_new0(NBDConn, 1);
+
+    assert(qemu_in_main_thread() && nbd_server);
     nbd_server->connections++;
+    object_ref(OBJECT(cioc));
+    conn->cioc = cioc;
+    QLIST_INSERT_HEAD(&nbd_server->conns, conn, next);
     nbd_update_server_watch(nbd_server);

     qio_channel_set_name(QIO_CHANNEL(cioc), "nbd-server");
     /* TODO - expose handshake timeout as QMP option */
     nbd_client_new(cioc, NBD_DEFAULT_HANDSHAKE_MAX_SECS,
                    nbd_server->tlscreds, nbd_server->tlsauthz,
-                   nbd_blockdev_client_closed, NULL);
+                   nbd_blockdev_client_closed, conn);
 }

 static void nbd_update_server_watch(NBDServerData *s)
@@ -81,12 +101,25 @@  static void nbd_update_server_watch(NBDServerData *s)

 static void nbd_server_free(NBDServerData *server)
 {
+    NBDConn *conn, *tmp;
+
     if (!server) {
         return;
     }

+    /*
+     * Forcefully close the listener socket, and any clients that have
+     * not yet disconnected on their own.
+     */
     qio_net_listener_disconnect(server->listener);
     object_unref(OBJECT(server->listener));
+    QLIST_FOREACH_SAFE(conn, &server->conns, next, tmp) {
+        qio_channel_shutdown(QIO_CHANNEL(conn->cioc), QIO_CHANNEL_SHUTDOWN_BOTH,
+                             NULL);
+    }
+
+    AIO_WAIT_WHILE_UNLOCKED(NULL, server->connections > 0);
+
     if (server->tlscreds) {
         object_unref(OBJECT(server->tlscreds));
     }