Message ID | 20180213202727.77175e21@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 2018-02-13 at 20:27 +0100, David Disseldorp wrote: > > + instance_guid = rados_get_instance_id(state->cluster); > + > + ret = ceph_service_instance_gen(instance_buf, sizeof(instance_buf), > + daemon, instance_guid); > + if (ret < 0) { > + goto out_mem_free; > + } > + > + DBG_DEBUG("registering as %s with %s Ceph cluster\n", instance_buf, > + (cluster_name != NULL ? cluster_name : "default")); > + > + ret = rados_service_register(state->cluster, "samba", instance_buf, ""); IIUC, if the program (samba in this case) dies, and then is restarted (maybe host reboots?), it'll get a new instance_guid, right? Or am I misunderstanding what rados_get_instance_id will return here?
Hi Jeff, On Thu, 15 Feb 2018 13:09:45 -0500, Jeff Layton wrote: > IIUC, if the program (samba in this case) dies, and then is restarted > (maybe host reboots?), it'll get a new instance_guid, right? Or am I > misunderstanding what rados_get_instance_id will return here? Yes, that's right. On smbd restart it'll reconnect and get a new instance ID. The hostname is carried in the metadata, which makes it a little easier to identify what's going on from the Ceph side. AFAICT the service record (unfortunately) remains present after rados_shutdown() until a timeout on the manager is tripped, so it's quite possible to have to registrations for the same host for a short amount of time. Cheers, David -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2018-02-15 at 19:52 +0100, David Disseldorp wrote: > Hi Jeff, > > On Thu, 15 Feb 2018 13:09:45 -0500, Jeff Layton wrote: > > > IIUC, if the program (samba in this case) dies, and then is restarted > > (maybe host reboots?), it'll get a new instance_guid, right? Or am I > > misunderstanding what rados_get_instance_id will return here? > > Yes, that's right. On smbd restart it'll reconnect and get a new > instance ID. The hostname is carried in the metadata, which makes it a > little easier to identify what's going on from the Ceph side. > > AFAICT the service record (unfortunately) remains present after > rados_shutdown() until a timeout on the manager is tripped, so > it's quite possible to have to registrations for the same host for a > short amount of time. > > Cheers, David Got it, thanks. I guess the fact that it changes it not a problem here? I'm not that well versed in what ceph-mgr does, tbqh... One thing you may want to be careful about: I found some thread safety problems a few months ago in how CephContext objects are handled in the ceph code. While I patched them up as best I could, a lot of older shipping versions still have those bugs. Even with those fixes, I think there may still be lurking problems when we have multiple ceph/rados clients within the same process image. The upshot here is that if you have the rados client here and (e.g.) a vfs_ceph client for exporting cephfs, you can hit some races particularly on setup and teardown of those clients that can cause crashes. libcephfs has a ceph_create_with_context image and librados has something similar. You might consider having a truly global cct object, and then have the vfs_ceph create a client handle with the same object.
On Thu, Feb 15, 2018 at 6:52 PM, David Disseldorp <ddiss@suse.de> wrote: > Hi Jeff, > > On Thu, 15 Feb 2018 13:09:45 -0500, Jeff Layton wrote: > >> IIUC, if the program (samba in this case) dies, and then is restarted >> (maybe host reboots?), it'll get a new instance_guid, right? Or am I >> misunderstanding what rados_get_instance_id will return here? > > Yes, that's right. On smbd restart it'll reconnect and get a new > instance ID. The hostname is carried in the metadata, which makes it a > little easier to identify what's going on from the Ceph side. As you say, the hostname is what most people are interested in most of the time anyway. Using a guid here will also going to be similar to what people see when running Ceph services in auto-scaled groups of Kubernetes containers -- lovingly hand-named daemons will be a thing of the past, and people will be dealing either with an ephemeral ID, or with the hostname from the metadata. The part where dead instances stick around in the status until they time out is annoying. In container environments we will need some logic for culling them when the container dies. In other environments we would either need to add the ability to do the explicit naming of the daemon, or just put up with the stale entries. John > AFAICT the service record (unfortunately) remains present after > rados_shutdown() until a timeout on the manager is tripped, so > it's quite possible to have to registrations for the same host for a > short amount of time. > > > Cheers, David > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Feb 19, 2018 at 7:27 AM, John Spray <jspray@redhat.com> wrote: > On Thu, Feb 15, 2018 at 6:52 PM, David Disseldorp <ddiss@suse.de> wrote: >> Hi Jeff, >> >> On Thu, 15 Feb 2018 13:09:45 -0500, Jeff Layton wrote: >> >>> IIUC, if the program (samba in this case) dies, and then is restarted >>> (maybe host reboots?), it'll get a new instance_guid, right? Or am I >>> misunderstanding what rados_get_instance_id will return here? >> >> Yes, that's right. On smbd restart it'll reconnect and get a new >> instance ID. The hostname is carried in the metadata, which makes it a >> little easier to identify what's going on from the Ceph side. > > As you say, the hostname is what most people are interested in most of > the time anyway. > > Using a guid here will also going to be similar to what people see > when running Ceph services in auto-scaled groups of Kubernetes > containers -- lovingly hand-named daemons will be a thing of the past, > and people will be dealing either with an ephemeral ID, or with the > hostname from the metadata. > > The part where dead instances stick around in the status until they > time out is annoying. In container environments we will need some > logic for culling them when the container dies. In other environments > we would either need to add the ability to do the explicit naming of > the daemon, or just put up with the stale entries. You know, OSDs report to the monitor when they’re going down so that their peers don’t need to wait through a timeout period. Should we consider doing the same when a declared service shuts down? -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Feb 19, 2018 at 5:28 PM, Gregory Farnum <gfarnum@redhat.com> wrote: > On Mon, Feb 19, 2018 at 7:27 AM, John Spray <jspray@redhat.com> wrote: >> On Thu, Feb 15, 2018 at 6:52 PM, David Disseldorp <ddiss@suse.de> wrote: >>> Hi Jeff, >>> >>> On Thu, 15 Feb 2018 13:09:45 -0500, Jeff Layton wrote: >>> >>>> IIUC, if the program (samba in this case) dies, and then is restarted >>>> (maybe host reboots?), it'll get a new instance_guid, right? Or am I >>>> misunderstanding what rados_get_instance_id will return here? >>> >>> Yes, that's right. On smbd restart it'll reconnect and get a new >>> instance ID. The hostname is carried in the metadata, which makes it a >>> little easier to identify what's going on from the Ceph side. >> >> As you say, the hostname is what most people are interested in most of >> the time anyway. >> >> Using a guid here will also going to be similar to what people see >> when running Ceph services in auto-scaled groups of Kubernetes >> containers -- lovingly hand-named daemons will be a thing of the past, >> and people will be dealing either with an ephemeral ID, or with the >> hostname from the metadata. >> >> The part where dead instances stick around in the status until they >> time out is annoying. In container environments we will need some >> logic for culling them when the container dies. In other environments >> we would either need to add the ability to do the explicit naming of >> the daemon, or just put up with the stale entries. > > You know, OSDs report to the monitor when they’re going down so that > their peers don’t need to wait through a timeout period. Should we > consider doing the same when a declared service shuts down? Yes, absolutely -- my head was stuck in the worst case (general case) path. Need a new shutdown message (or flag on one of the existing messages) and MgrClient should send it in ::shutdown if session && service_daemon. This will cause a little shutdown delay in librados, but I think it's acceptable as it's only for service_daemon users. http://tracker.ceph.com/issues/23042 John -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 19 Feb 2018, John Spray wrote: > On Mon, Feb 19, 2018 at 5:28 PM, Gregory Farnum <gfarnum@redhat.com> wrote: > > On Mon, Feb 19, 2018 at 7:27 AM, John Spray <jspray@redhat.com> wrote: > >> On Thu, Feb 15, 2018 at 6:52 PM, David Disseldorp <ddiss@suse.de> wrote: > >>> Hi Jeff, > >>> > >>> On Thu, 15 Feb 2018 13:09:45 -0500, Jeff Layton wrote: > >>> > >>>> IIUC, if the program (samba in this case) dies, and then is restarted > >>>> (maybe host reboots?), it'll get a new instance_guid, right? Or am I > >>>> misunderstanding what rados_get_instance_id will return here? > >>> > >>> Yes, that's right. On smbd restart it'll reconnect and get a new > >>> instance ID. The hostname is carried in the metadata, which makes it a > >>> little easier to identify what's going on from the Ceph side. > >> > >> As you say, the hostname is what most people are interested in most of > >> the time anyway. > >> > >> Using a guid here will also going to be similar to what people see > >> when running Ceph services in auto-scaled groups of Kubernetes > >> containers -- lovingly hand-named daemons will be a thing of the past, > >> and people will be dealing either with an ephemeral ID, or with the > >> hostname from the metadata. > >> > >> The part where dead instances stick around in the status until they > >> time out is annoying. In container environments we will need some > >> logic for culling them when the container dies. In other environments > >> we would either need to add the ability to do the explicit naming of > >> the daemon, or just put up with the stale entries. > > > > You know, OSDs report to the monitor when they’re going down so that > > their peers don’t need to wait through a timeout period. Should we > > consider doing the same when a declared service shuts down? > > Yes, absolutely -- my head was stuck in the worst case (general case) path. > > Need a new shutdown message (or flag on one of the existing messages) > and MgrClient should send it in ::shutdown if session && > service_daemon. This will cause a little shutdown delay in librados, > but I think it's acceptable as it's only for service_daemon users. > http://tracker.ceph.com/issues/23042 Yes! https://trello.com/c/wZdg9Yh8/326-mgrclient-deregister-on-shutdown sage
From 7b5a4217f40e00cf43bb681c8e673ae4451a2d0d Mon Sep 17 00:00:00 2001 From: David Disseldorp <ddiss@samba.org> Date: Wed, 7 Feb 2018 14:58:48 +0100 Subject: [PATCH 1/2] ceph: add rados service integration On startup, Samba advertises its presence to the Ceph Manager Service via rados_service_register(). librados spawns a number of threads to maintain the cluster connection and maintain periodic service heartbeats. Signed-off-by: David Disseldorp <ddiss@samba.org> --- source3/smbd/rados_service.c | 146 +++++++++++++++++++++++++++++++++++++++++++ source3/smbd/rados_service.h | 30 +++++++++ source3/smbd/server.c | 7 +++ source3/wscript | 10 ++- source3/wscript_build | 12 +++- 5 files changed, 203 insertions(+), 2 deletions(-) create mode 100644 source3/smbd/rados_service.c create mode 100644 source3/smbd/rados_service.h diff --git a/source3/smbd/rados_service.c b/source3/smbd/rados_service.c new file mode 100644 index 00000000000..0e47e759ea7 --- /dev/null +++ b/source3/smbd/rados_service.c @@ -0,0 +1,146 @@ +/* + Copyright (C) David Disseldorp 2018 + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. +*/ + +#include "source3/include/includes.h" +#include <rados/librados.h> +#include "rados_service.h" + +struct ceph_service_state { + pid_t svc_reg_pid; + rados_t cluster; +}; + +static int ceph_service_destroy(struct ceph_service_state *state) +{ + /* + * ensure that processes forked after spawning the service registration + * connection don't additionally attempt cleanup. + */ + if (state->svc_reg_pid != getpid()) { + DBG_DEBUG("ignoring non-spawner destructor\n"); + return 0; + } + + DBG_DEBUG("ending Ceph cluster service registration\n"); + rados_shutdown(state->cluster); + DBG_DEBUG("rados service connection shutdown\n"); + return 0; +} + +static int ceph_service_instance_gen(char *buf, size_t buflen, + const char *daemon, + uint64_t instance_guid) +{ + int ret; + + ret = snprintf(buf, buflen, "%s:0x%016llx", daemon, + (unsigned long long)instance_guid); + if (ret >= buflen) { + DBG_ERR("Ceph instance name too long, skipping service " + "registration\n"); + return -ENAMETOOLONG; + } + + return 0; +} + +int ceph_service_register(struct tevent_context *ev_ctx, const char *daemon) +{ + int ret; + const char *user_id = NULL; + const char *conf_file = NULL; + const char *cluster_name = NULL; + uint64_t instance_guid; + char instance_buf[128]; + struct ceph_service_state *state = talloc(ev_ctx, + struct ceph_service_state); + + if (state == NULL) { + return -ENOMEM; + } + + state->svc_reg_pid = getpid(); + + /* + * XXX This does not reuse the existing (share based) "ceph:user_id" + * parameter, to allow for: + * - running Samba with Ceph support, but without service registration. + * - future mapping between Samba and Ceph users. + */ + user_id = lp_parm_const_string(GLOBAL_SECTION_SNUM, "ceph", + "service_user_id", NULL); + if (user_id == NULL) { + DBG_DEBUG("built with Ceph support, but missing ceph:" + "service_user_id configuration - skipping service " + "registration\n"); + ret = 0; + goto out_mem_free; + } + + /* a NULL cluster_name or conf_file sees librados use the defaults */ + cluster_name = lp_parm_const_string(GLOBAL_SECTION_SNUM, "ceph", + "cluster_name", NULL); + conf_file = lp_parm_const_string(GLOBAL_SECTION_SNUM, "ceph", + "config_file", NULL); + + ret = rados_create2(&state->cluster, cluster_name, user_id, 0); + if (ret < 0) { + DBG_ERR("failed to create ceph cluster context\n"); + goto out_mem_free; + } + + ret = rados_conf_read_file(state->cluster, conf_file); + if (ret < 0) { + DBG_ERR("failed to parse Ceph cluster config: %s\n", + strerror(-ret)); + goto err_cluster_free; + } + + ret = rados_connect(state->cluster); + if (ret < 0) { + DBG_ERR("failed to connect to ceph cluster\n"); + goto err_cluster_free; + } + + instance_guid = rados_get_instance_id(state->cluster); + + ret = ceph_service_instance_gen(instance_buf, sizeof(instance_buf), + daemon, instance_guid); + if (ret < 0) { + goto out_mem_free; + } + + DBG_DEBUG("registering as %s with %s Ceph cluster\n", instance_buf, + (cluster_name != NULL ? cluster_name : "default")); + + ret = rados_service_register(state->cluster, "samba", instance_buf, ""); + if (ret < 0) { + DBG_ERR("failed to register service with ceph cluster\n"); + goto err_cluster_free; + } + + /* close cluster conn and drop service listing on server exit */ + talloc_set_destructor(state, ceph_service_destroy); + + return 0; + +err_cluster_free: + rados_shutdown(state->cluster); +out_mem_free: + talloc_free(state); + return ret; +} diff --git a/source3/smbd/rados_service.h b/source3/smbd/rados_service.h new file mode 100644 index 00000000000..ef544b79798 --- /dev/null +++ b/source3/smbd/rados_service.h @@ -0,0 +1,30 @@ +/* + Copyright (C) David Disseldorp 2018 + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. +*/ + +#ifndef _RADOS_SERVICE_H +#define _RADOS_SERVICE_H + +int ceph_service_register(struct tevent_context *ev_ctx, const char *daemon); +#if !defined(HAVE_LIBRADOS_SERVICES) +/* built without librados service support, do nothing */ +int ceph_service_register(struct tevent_context *ev_ctx, const char *daemon) +{ + return 0; +} +#endif + +#endif /* _RADOS_SERVICE_H */ diff --git a/source3/smbd/server.c b/source3/smbd/server.c index 99baf9d519d..d6629480eac 100644 --- a/source3/smbd/server.c +++ b/source3/smbd/server.c @@ -54,6 +54,7 @@ #include "lib/util/sys_rw.h" #include "cleanupdb.h" #include "g_lock.h" +#include "rados_service.h" #ifdef CLUSTER_SUPPORT #include "ctdb_protocol.h" @@ -1593,6 +1594,7 @@ extern void build_options(bool screen); struct smbd_parent_context *parent = NULL; TALLOC_CTX *frame; NTSTATUS status; + int ret; struct tevent_context *ev_ctx; struct messaging_context *msg_ctx; struct server_id server_id; @@ -2053,6 +2055,11 @@ extern void build_options(bool screen); * -- bad things will happen if smbd is launched via inetd * and we fork a copy of ourselves here */ if (is_daemon && !interactive) { + ret = ceph_service_register(ev_ctx, "smbd"); + if (ret < 0) { + exit_daemon("Samba failed to register with Ceph " + "cluster", -ret); + } if (rpc_lsasd_daemon() == RPC_DAEMON_FORK) { start_lsasd(ev_ctx, msg_ctx); diff --git a/source3/wscript b/source3/wscript index 6d2d94bae87..605bdca6e01 100644 --- a/source3/wscript +++ b/source3/wscript @@ -1544,14 +1544,18 @@ main() { conf.env['CCFLAGS_CEPHFS'] = "-D_FILE_OFFSET_BITS=64" if Options.options.libcephfs_dir: + conf.env['CPPPATH_RADOS'] = Options.options.libcephfs_dir + '/include' conf.env['CPPPATH_CEPHFS'] = Options.options.libcephfs_dir + '/include' conf.env['LIBPATH_CEPHFS'] = Options.options.libcephfs_dir + '/lib' conf.env['LIBPATH_CEPH-COMMON'] = Options.options.libcephfs_dir + '/lib/ceph' + conf.env['LIBPATH_RADOS'] = Options.options.libcephfs_dir + '/lib' if (Options.options.with_cephfs and conf.CHECK_HEADERS('cephfs/libcephfs.h', False, False, 'cephfs') and + conf.CHECK_HEADERS('rados/librados.h', False, False, 'rados') and conf.CHECK_LIB('cephfs', shlib=True) and - conf.CHECK_LIB('ceph-common', shlib=True)): + conf.CHECK_LIB('ceph-common', shlib=True) and + conf.CHECK_LIB('rados', shlib=True)): if Options.options.with_acl_support: conf.DEFINE('HAVE_CEPH', '1') if conf.CHECK_FUNCS_IN('ceph_statx', 'cephfs ceph-common', @@ -1561,6 +1565,10 @@ main() { Logs.warn("ceph support disabled due to --without-acl-support") conf.undefine('HAVE_CEPH') + if conf.CHECK_FUNCS_IN('rados_service_register', 'rados ceph-common', + headers='rados/librados.h'): + conf.DEFINE('HAVE_LIBRADOS_SERVICES', '1') + if Options.options.with_glusterfs: conf.CHECK_CFG(package='glusterfs-api', args='"glusterfs-api >= 4" --cflags --libs', msg='Checking for glusterfs-api >= 4', uselib_store="GFAPI") diff --git a/source3/wscript_build b/source3/wscript_build index 76c5d6e203b..64f4b7ecebd 100644 --- a/source3/wscript_build +++ b/source3/wscript_build @@ -1052,6 +1052,16 @@ bld.SAMBA3_SUBSYSTEM('SPOOLSSD', RPC_SOCK_HELPER ''') +smbd_rados_deps = '' +if bld.CONFIG_SET('HAVE_LIBRADOS_SERVICES'): + bld.SAMBA_LIBRARY('rados_service', + source='smbd/rados_service.c', + deps='SMBCONF_PARAM samba-debug', + local_include=False, + public_deps='rados ceph-common', + private_library=True) + smbd_rados_deps += ' rados_service' + ########################## BINARIES ################################# bld.SAMBA3_BINARY('smbd/smbd', @@ -1064,7 +1074,7 @@ bld.SAMBA3_BINARY('smbd/smbd', FSSD MDSSD SPOOLSSD - ''', + ''' + smbd_rados_deps, install_path='${SBINDIR}') -- 2.13.6 From 463681ff53b5b0f12c5b6b3be5db4d27aceda881 Mon Sep 17 00:00:00 2001 From: David Disseldorp <ddiss@samba.org> Date: Tue, 13 Feb 2018 20:16:48 +0100 Subject: [PATCH 2/2] docs/vfs_ceph: describe ceph:service_user_id parameter Signed-off-by: David Disseldorp <ddiss@samba.org> --- docs-xml/manpages/vfs_ceph.8.xml | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/docs-xml/manpages/vfs_ceph.8.xml b/docs-xml/manpages/vfs_ceph.8.xml index 453030e50de..e84d75b0963 100644 --- a/docs-xml/manpages/vfs_ceph.8.xml +++ b/docs-xml/manpages/vfs_ceph.8.xml @@ -97,6 +97,20 @@ </listitem> </varlistentry> + <varlistentry> + <term>ceph:service_user_id = id</term> + <listitem> + <para> + Allows one to explicitly set the client ID used for + registration of the Samba smbd daemon with the Ceph + Manager service. + </para> + <para> + Example: ceph:service_user_id = client.service_reg + </para> + </listitem> + </varlistentry> + </variablelist> </refsect1> -- 2.13.6