From patchwork Mon Nov 13 15:58:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13454155 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 44FB0208AE for ; Mon, 13 Nov 2023 15:58:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CUqOlS+w" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67C0410EC for ; Mon, 13 Nov 2023 07:58:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1699891094; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VHCmFoypa2kVhT1S4U7DdPP9lkZiv+u3flT6Y9kyqx0=; b=CUqOlS+wI+A9rromq0j8Vtf/Mo68pXOcgRzl0nPyzyfXJxw/4x7FrpK0dQHT3MWCeRA0gv LafDZCGtVVIphJn3U5yf1NGZQM46LZdIUlmpUL+lqgWbWDHqZxROr4YKZYECZvHvC+idbC HWJRB9g5hi7S6vWjZjORiGttg+v9JN0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-13-D8-UlHItMBWGKDSLMRZfVg-1; Mon, 13 Nov 2023 10:58:10 -0500 X-MC-Unique: D8-UlHItMBWGKDSLMRZfVg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7B5C2185A781; Mon, 13 Nov 2023 15:58:10 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.42.28.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id 74ED61121306; Mon, 13 Nov 2023 15:58:09 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: <45c0918e-5948-408e-a928-f764f3b1a42b@auristor.com> References: <45c0918e-5948-408e-a928-f764f3b1a42b@auristor.com> <20231109154004.3317227-1-dhowells@redhat.com> <20231109154004.3317227-40-dhowells@redhat.com> To: Jeffrey E Altman Cc: dhowells@redhat.com, Marc Dionne , linux-afs@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 42/41] afs: Fix the handling of RO volumes Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-ID: <3627312.1699891088.1@warthog.procyon.org.uk> Date: Mon, 13 Nov 2023 15:58:08 +0000 Message-ID: <3627313.1699891088@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.3 Jeffrey E Altman wrote: > The atomic visibility of a volume cloning operation is limited to individual > volume locations. It is untrue that a "vos release" updates all RO volume > locations simultaneously. [This need folding down into the preceding patches] Fix the handling of RO volumes which aren't actually updated all at once, but are rather updated in stages. Switch from using the old servers to the new servers when at least half the servers are marked as new replication sites (VLSF_NEWREPSITE). If an operation gets a reply from a server that indicates that an RO volume is of a more recent snapshot, we regenerate the server list and that excludes whichever bunch of servers we don't want to look at. If the server we got the reply from is now excluded, we reissue the operation, which will cause the rotation algorithm to start agian. Whilst it thinks that RO replication is in progress, it rechecks the VLDB record every 2s. The problem is that as we don't then maintain a callback on an excluded server, we won't get a notification of completion of the replication process and will maintain the exclusion until the normal VLDB recheck happens after an hour. To simplify things, the volume creation and update times are set to TIME64_MIN when not valid rather than using a separate flag, and a mutex is used to serialise attempts to modify them. David --- fs/afs/afs.h | 7 - fs/afs/dir.c | 4 - fs/afs/fs_operation.c | 5 - fs/afs/fsclient.c | 4 - fs/afs/inode.c | 5 - fs/afs/internal.h | 17 +++- fs/afs/rotate.c | 22 +++++- fs/afs/server_list.c | 52 ++++++++++++-- fs/afs/validation.c | 164 +++++++++++++++++++++++++++++++++++---------- fs/afs/vlclient.c | 19 ----- fs/afs/volume.c | 10 ++ fs/afs/yfsclient.c | 1 include/trace/events/afs.h | 3 13 files changed, 230 insertions(+), 83 deletions(-) diff --git a/fs/afs/afs.h b/fs/afs/afs.h index a6328a9ec9b0..b488072aee87 100644 --- a/fs/afs/afs.h +++ b/fs/afs/afs.h @@ -165,11 +165,8 @@ struct afs_status_cb { * AFS volume synchronisation information */ struct afs_volsync { - unsigned int mask; /* Bitmask of supplied fields */ -#define AFS_VOLSYNC_CREATION 0x01 -#define AFS_VOLSYNC_UPDATE 0x02 - time64_t creation; /* volume creation time */ - time64_t update; /* Volume update time */ + time64_t creation; /* Volume creation time (or TIME64_MIN) */ + time64_t update; /* Volume update time (or TIME64_MIN) */ }; /* diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 190d1ff42e60..7ea5803c05da 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -807,8 +807,8 @@ static struct inode *afs_do_lookup(struct inode *dir, struct dentry *dentry, cookie->fids[i].vid = dvnode->fid.vid; cookie->ctx.actor = afs_lookup_filldir; cookie->name = dentry->d_name; - cookie->nr_fids = 2; /* slot 0 is saved for the fid we actually want - * and slot 1 for the directory */ + cookie->nr_fids = 2; /* slot 1 is saved for the fid we actually want + * and slot 0 for the directory */ if (!afs_server_supports_ibulk(dvnode)) cookie->one_only = true; diff --git a/fs/afs/fs_operation.c b/fs/afs/fs_operation.c index e0f5920a5aef..3546b087e791 100644 --- a/fs/afs/fs_operation.c +++ b/fs/afs/fs_operation.c @@ -39,9 +39,8 @@ struct afs_operation *afs_alloc_operation(struct key *key, struct afs_volume *vo op->volume = afs_get_volume(volume, afs_volume_trace_get_new_op); op->net = volume->cell->net; op->cb_v_break = atomic_read(&volume->cb_v_break); - op->pre_volsync.mask = READ_ONCE(volume->volsync_mask); - op->pre_volsync.creation = atomic64_read(&volume->creation_time); - op->pre_volsync.update = atomic64_read(&volume->update_time); + op->pre_volsync.creation = volume->creation_time; + op->pre_volsync.update = volume->update_time; op->debug_id = atomic_inc_return(&afs_operation_debug_counter); op->nr_iterations = -1; afs_op_set_error(op, -EDESTADDRREQ); diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 1bd1614688f7..80f7d9e796e3 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -165,10 +165,8 @@ static void xdr_decode_AFSVolSync(const __be32 **_bp, bp++; /* spare6 */ *_bp = bp; - if (volsync) { + if (volsync) volsync->creation = creation; - volsync->mask |= AFS_VOLSYNC_CREATION; - } } /* diff --git a/fs/afs/inode.c b/fs/afs/inode.c index 04c335fab1a5..0fe1c63c9af5 100644 --- a/fs/afs/inode.c +++ b/fs/afs/inode.c @@ -212,7 +212,8 @@ static void afs_apply_status(struct afs_operation *op, vnode->status = *status; if (vp->dv_before + vp->dv_delta != status->data_version) { - if (vnode->cb_expires_at != AFS_NO_CB_PROMISE) + if (vnode->cb_ro_snapshot == atomic_read(&vnode->volume->cb_ro_snapshot) && + vnode->cb_expires_at != AFS_NO_CB_PROMISE) pr_warn("kAFS: vnode modified {%llx:%llu} %llx->%llx %s (op=%x)\n", vnode->fid.vid, vnode->fid.vnode, (unsigned long long)vp->dv_before + vp->dv_delta, @@ -327,8 +328,6 @@ static void afs_fetch_status_success(struct afs_operation *op) struct afs_vnode *vnode = vp->vnode; int ret; - afs_update_volume_state(op, vp); - if (vnode->netfs.inode.i_state & I_NEW) { ret = afs_inode_init_from_status(op, vp, vnode); afs_op_set_error(op, ret); diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 43e438697baa..0c41601324fc 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -512,6 +512,7 @@ struct afs_vldb_entry { #define AFS_VOL_VTM_RW 0x01 /* R/W version of the volume is available (on this server) */ #define AFS_VOL_VTM_RO 0x02 /* R/O version of the volume is available (on this server) */ #define AFS_VOL_VTM_BAK 0x04 /* backup version of the volume is available (on this server) */ + u8 vlsf_flags[AFS_NMAXNSERVERS]; short error; u8 nr_servers; /* Number of server records */ u8 name_len; @@ -595,6 +596,12 @@ struct afs_server { spinlock_t probe_lock; }; +enum afs_ro_replicating { + AFS_RO_NOT_REPLICATING, /* Not doing replication */ + AFS_RO_REPLICATING_USE_OLD, /* Replicating; use old version */ + AFS_RO_REPLICATING_USE_NEW, /* Replicating; switch to new version */ +} __mode(byte); + /* * Replaceable volume server list. */ @@ -606,12 +613,14 @@ struct afs_server_entry { unsigned long flags; #define AFS_SE_VOLUME_OFFLINE 0 /* Set if volume offline notice given */ #define AFS_SE_VOLUME_BUSY 1 /* Set if volume busy notice given */ +#define AFS_SE_EXCLUDED 2 /* Set if server is to be excluded in rotation */ }; struct afs_server_list { struct rcu_head rcu; refcount_t usage; bool attached; /* T if attached to servers */ + enum afs_ro_replicating ro_replicating; /* RW->RO update (probably) in progress */ unsigned char nr_servers; unsigned short vnovol_mask; /* Servers to be skipped due to VNOVOL */ unsigned int seq; /* Set to ->servers_seq when installed */ @@ -647,9 +656,9 @@ struct afs_volume { unsigned int servers_seq; /* Incremented each time ->servers changes */ /* RO release tracking */ - unsigned int volsync_mask; /* Mask of what values we have obtained */ - atomic64_t creation_time; /* Volume creation time (time64_t) */ - atomic64_t update_time; /* Volume update time (time64_t) */ + struct mutex volsync_lock; /* Time/state evaluation lock */ + time64_t creation_time; /* Volume creation time (or TIME64_MIN) */ + time64_t update_time; /* Volume update time (or TIME64_MIN) */ /* Callback management */ struct mutex cb_check_lock; /* Lock to control race to check after v_break */ @@ -1571,7 +1580,7 @@ extern void afs_fs_exit(void); * validation.c */ bool afs_check_validity(const struct afs_vnode *vnode); -void afs_update_volume_state(struct afs_operation *op, struct afs_vnode_param *vp); +int afs_update_volume_state(struct afs_operation *op); int afs_validate(struct afs_vnode *vnode, struct key *key); /* diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c index 099e0d1caa67..79c736eeca29 100644 --- a/fs/afs/rotate.c +++ b/fs/afs/rotate.c @@ -187,6 +187,22 @@ bool afs_select_fileserver(struct afs_operation *op) clear_bit(AFS_SE_VOLUME_BUSY, &op->server_list->servers[op->server_index].flags); op->cumul_error.responded = true; + + /* We succeeded, but we may need to check the VLDB and redo the + * op from another server if we're looking at a set of RO + * volumes where some of the servers have not yet been brought + * up to date to avoid regressing the data. We only switch to + * the new version once >=50% of the servers are updated. + */ + error = afs_update_volume_state(op); + if (error != 0) { + if (error == 1) { + afs_sleep_and_retry(op); + goto restart_from_beginning; + } + afs_op_set_error(op, error); + goto failed; + } fallthrough; default: /* Success or local failure. Stop. */ @@ -517,10 +533,12 @@ bool afs_select_fileserver(struct afs_operation *op) best_prio = -1; for (i = 0; i < op->server_list->nr_servers; i++) { struct afs_endpoint_state *es; + struct afs_server_entry *se = &op->server_list->servers[i]; struct afs_addr_list *sal; - struct afs_server *s = op->server_list->servers[i].server; + struct afs_server *s = se->server; if (!test_bit(i, &op->untried_servers) || + test_bit(AFS_SE_EXCLUDED, &se->flags) || !test_bit(AFS_SERVER_FL_RESPONDING, &s->flags)) continue; es = op->server_states->endpoint_state; @@ -598,6 +616,8 @@ bool afs_select_fileserver(struct afs_operation *op) op->addr_index = addr_index; set_bit(addr_index, &op->addr_tried); + op->volsync.creation = TIME64_MIN; + op->volsync.update = TIME64_MIN; op->call_responded = false; _debug("address [%u] %u/%u %pISp", op->server_index, addr_index, alist->nr_addrs, diff --git a/fs/afs/server_list.c b/fs/afs/server_list.c index c9f146f69adf..5d45ff95946c 100644 --- a/fs/afs/server_list.c +++ b/fs/afs/server_list.c @@ -31,23 +31,55 @@ struct afs_server_list *afs_alloc_server_list(struct afs_volume *volume, struct afs_server_list *slist; struct afs_server *server; unsigned int type_mask = 1 << volume->type; - int ret = -ENOMEM, nr_servers = 0, i, j; - - for (i = 0; i < vldb->nr_servers; i++) - if (vldb->fs_mask[i] & type_mask) - nr_servers++; + bool use_newrepsites = false; + int ret = -ENOMEM, nr_servers = 0, newrep = 0, i, j; + u32 tmp; + + /* Work out if we're going to restrict to NEWREPSITE-marked servers or + * not. If at least one site is marked as NEWREPSITE, then it's likely + * that "vos release" is busy updating RO sites. We cut over from one + * to the other when >=50% of the sites have been updated. Sites that + * are in the process of being updated are marked DONTUSE. + */ + for (i = 0; i < vldb->nr_servers; i++) { + if (!(vldb->fs_mask[i] & type_mask)) + continue; + nr_servers++; + if (vldb->vlsf_flags[i] & AFS_VLSF_DONTUSE) + continue; + if (vldb->vlsf_flags[i] & AFS_VLSF_NEWREPSITE) + newrep++; + } slist = kzalloc(struct_size(slist, servers, nr_servers), GFP_KERNEL); if (!slist) goto error; + if (newrep) { + if (newrep < nr_servers / 2) { + kdebug("USE-OLD"); + slist->ro_replicating = AFS_RO_REPLICATING_USE_OLD; + } else { + kdebug("USE-NEW"); + slist->ro_replicating = AFS_RO_REPLICATING_USE_NEW; + use_newrepsites = true; + } + } + refcount_set(&slist->usage, 1); rwlock_init(&slist->lock); /* Make sure a records exists for each server in the list. */ for (i = 0; i < vldb->nr_servers; i++) { + unsigned long se_flags = 0; + bool newrepsite = tmp & AFS_VLSF_NEWREPSITE; + if (!(vldb->fs_mask[i] & type_mask)) continue; + if (tmp & AFS_VLSF_DONTUSE) + __set_bit(AFS_SE_EXCLUDED, &se_flags); + if (newrep && (newrepsite ^ use_newrepsites)) + __set_bit(AFS_SE_EXCLUDED, &se_flags); server = afs_lookup_server(volume->cell, key, &vldb->fs_server[i], vldb->addr_version[i]); @@ -79,6 +111,7 @@ struct afs_server_list *afs_alloc_server_list(struct afs_volume *volume, slist->servers[j].server = server; slist->servers[j].volume = volume; + slist->servers[j].flags = se_flags; slist->servers[j].cb_expires_at = AFS_NO_CB_PROMISE; slist->nr_servers++; } @@ -102,14 +135,19 @@ struct afs_server_list *afs_alloc_server_list(struct afs_volume *volume, bool afs_annotate_server_list(struct afs_server_list *new, struct afs_server_list *old) { + unsigned long mask = 1UL << AFS_SE_EXCLUDED; int i; - if (old->nr_servers != new->nr_servers) + if (old->nr_servers != new->nr_servers || + old->ro_replicating != new->ro_replicating) goto changed; - for (i = 0; i < old->nr_servers; i++) + for (i = 0; i < old->nr_servers; i++) { if (old->servers[i].server != new->servers[i].server) goto changed; + if ((old->servers[i].flags & mask) != (new->servers[i].flags & mask)) + goto changed; + } return false; changed: return true; diff --git a/fs/afs/validation.c b/fs/afs/validation.c index 3802790fe4de..f8e24439727a 100644 --- a/fs/afs/validation.c +++ b/fs/afs/validation.c @@ -135,49 +135,115 @@ bool afs_check_validity(const struct afs_vnode *vnode) return true; } +/* + * See if the server we've just talked to is currently excluded. + */ +static bool __afs_is_server_excluded(struct afs_operation *op, struct afs_volume *volume) +{ + const struct afs_server_entry *se; + const struct afs_server_list *slist; + bool is_excluded = true; + int i; + + rcu_read_lock(); + + slist = rcu_dereference(volume->servers); + for (i = 0; i < slist->nr_servers; i++) { + se = &slist->servers[i]; + if (op->server == se->server) { + is_excluded = test_bit(AFS_SE_EXCLUDED, &se->flags); + break; + } + } + + rcu_read_unlock(); + return is_excluded; +} + +/* + * Update the volume's server list when the creation time changes and see if + * the server we've just talked to is currently excluded. + */ +static int afs_is_server_excluded(struct afs_operation *op, struct afs_volume *volume) +{ + int ret; + + if (__afs_is_server_excluded(op, volume)) + return 1; + + set_bit(AFS_VOLUME_NEEDS_UPDATE, &volume->flags); + ret = afs_check_volume_status(op->volume, op); + if (ret < 0) + return ret; + + return __afs_is_server_excluded(op, volume); +} + /* * Handle a change to the volume creation time in the VolSync record. */ -static void afs_update_volume_creation_time(struct afs_operation *op, struct afs_volume *volume) +static int afs_update_volume_creation_time(struct afs_operation *op, struct afs_volume *volume) { - enum afs_cb_break_reason reason = afs_cb_break_for_vos_release; unsigned int snap; - time64_t cur = atomic64_read(&volume->creation_time); + time64_t cur = volume->creation_time; time64_t old = op->pre_volsync.creation; time64_t new = op->volsync.creation; + int ret; _enter("%llx,%llx,%llx->%llx", volume->vid, cur, old, new); - if (!(op->volsync.mask & AFS_VOLSYNC_CREATION) || - !(volume->volsync_mask & AFS_VOLSYNC_CREATION)) { - atomic64_set(&volume->creation_time, new); - volume->volsync_mask |= AFS_VOLSYNC_CREATION; - return; + if (cur == TIME64_MIN) { + volume->creation_time = new; + return 0; } - if (likely(new == cur)) - return; + if (new == cur) + return 0; + + /* Try to advance the creation timestamp from what we had before the + * operation to what we got back from the server. This should + * hopefully ensure that in a race between multiple operations only one + * of them will do this. + */ + if (cur != old) + return 0; /* If the creation time changes in an unexpected way, we need to scrub * our caches. For a RW vol, this will only change if the volume is * restored from a backup; for a RO/Backup vol, this will advance when * the volume is updated to a new snapshot (eg. "vos release"). */ - if (volume->type == AFSVL_RWVOL || new < old) - reason = afs_cb_break_for_creation_regress; + if (volume->type == AFSVL_RWVOL) + goto regressed; + if (volume->type == AFSVL_BACKVOL) { + if (new < old) + goto regressed; + goto advance; + } - /* Try to advance the creation timestamp from what we had before the - * operation to what we got back from the server. This should - * hopefully ensure that in a race between multiple operations only one - * of them will do this. + /* We have an RO volume, we need to query the VL server and look at the + * server flags to see if RW->RO replication is in progress. */ - if (atomic64_try_cmpxchg(&volume->creation_time, &old, new)) { - if (reason == afs_cb_break_for_creation_regress) - atomic_inc(&volume->cb_scrub); - else if (volume->type != AFSVL_RWVOL) - snap = atomic_inc_return(&volume->cb_ro_snapshot); - trace_afs_cb_v_break(volume->vid, snap, reason); + ret = afs_is_server_excluded(op, volume); + if (ret < 0) + return ret; + if (ret > 0) { + snap = atomic_read(&volume->cb_ro_snapshot); + trace_afs_cb_v_break(volume->vid, snap, afs_cb_break_volume_excluded); + return ret; } + +advance: + snap = atomic_inc_return(&volume->cb_ro_snapshot); + trace_afs_cb_v_break(volume->vid, snap, afs_cb_break_for_vos_release); + volume->creation_time = new; + return 0; + +regressed: + atomic_inc(&volume->cb_scrub); + trace_afs_cb_v_break(volume->vid, 0, afs_cb_break_for_creation_regress); + volume->creation_time = new; + return 0; } /* @@ -186,20 +252,18 @@ static void afs_update_volume_creation_time(struct afs_operation *op, struct afs static void afs_update_volume_update_time(struct afs_operation *op, struct afs_volume *volume) { enum afs_cb_break_reason reason = afs_cb_break_no_break; - time64_t cur = atomic64_read(&volume->update_time); + time64_t cur = volume->update_time; time64_t old = op->pre_volsync.update; time64_t new = op->volsync.update; _enter("%llx,%llx,%llx->%llx", volume->vid, cur, old, new); - if (!(op->volsync.mask & AFS_VOLSYNC_UPDATE) || - !(volume->volsync_mask & AFS_VOLSYNC_UPDATE)) { - atomic64_set(&volume->update_time, new); - volume->volsync_mask |= AFS_VOLSYNC_UPDATE; + if (cur == TIME64_MIN) { + volume->update_time = new; return; } - if (likely(new == cur)) + if (new == cur) return; /* If the volume update time changes in an unexpected way, we need to @@ -215,33 +279,59 @@ static void afs_update_volume_update_time(struct afs_operation *op, struct afs_v * hopefully ensure that in a race between multiple operations only one * of them will do this. */ - if (atomic64_try_cmpxchg(&volume->update_time, &old, new)) { + if (cur == old) { if (reason == afs_cb_break_for_update_regress) { atomic_inc(&volume->cb_scrub); trace_afs_cb_v_break(volume->vid, 0, reason); } + volume->update_time = new; + } +} + +static int afs_update_volume_times(struct afs_operation *op, struct afs_volume *volume) +{ + int ret = 0; + + if (likely(op->volsync.creation == volume->creation_time && + op->volsync.update == volume->update_time)) + return 0; + + mutex_lock(&volume->volsync_lock); + if (op->volsync.creation != volume->creation_time) { + ret = afs_update_volume_creation_time(op, volume); + if (ret < 0) + goto out; } + if (op->volsync.update != volume->update_time) + afs_update_volume_update_time(op, volume); +out: + mutex_unlock(&volume->volsync_lock); + return ret; } /* * Update the state of a volume, including recording the expiration time of the - * callback promise. + * callback promise. Returns 1 to redo the operation from the start. */ -void afs_update_volume_state(struct afs_operation *op, struct afs_vnode_param *vp) +int afs_update_volume_state(struct afs_operation *op) { struct afs_server_list *slist = op->server_list; struct afs_server_entry *se = &slist->servers[op->server_index]; - struct afs_callback *cb = &vp->scb.callback; + struct afs_callback *cb = &op->file[0].scb.callback; struct afs_volume *volume = op->volume; unsigned int cb_v_break = atomic_read(&volume->cb_v_break); unsigned int cb_v_check = atomic_read(&volume->cb_v_check); + int ret; _enter("%llx", op->volume->vid); - if (op->volsync.mask & AFS_VOLSYNC_CREATION) - afs_update_volume_creation_time(op, volume); - if (op->volsync.mask & AFS_VOLSYNC_UPDATE) - afs_update_volume_update_time(op, volume); + if (op->volsync.creation != TIME64_MIN || op->volsync.update != TIME64_MIN) { + ret = afs_update_volume_times(op, volume); + if (ret != 0) { + _leave(" = %d", ret); + return ret; + } + } if (op->cb_v_break == cb_v_break) { se->cb_expires_at = cb->expires_at; @@ -249,6 +339,8 @@ void afs_update_volume_state(struct afs_operation *op, struct afs_vnode_param *v } if (cb_v_check < op->cb_v_break) atomic_cmpxchg(&volume->cb_v_check, cb_v_check, op->cb_v_break); + _leave(" = %d", ret); + return ret; } /* diff --git a/fs/afs/vlclient.c b/fs/afs/vlclient.c index cef02a265edc..cac75f89b64a 100644 --- a/fs/afs/vlclient.c +++ b/fs/afs/vlclient.c @@ -18,8 +18,7 @@ static int afs_deliver_vl_get_entry_by_name_u(struct afs_call *call) { struct afs_uvldbentry__xdr *uvldb; struct afs_vldb_entry *entry; - bool new_only = false; - u32 tmp, nr_servers, vlflags; + u32 nr_servers, vlflags; int i, ret; _enter(""); @@ -41,27 +40,14 @@ static int afs_deliver_vl_get_entry_by_name_u(struct afs_call *call) entry->name[i] = 0; entry->name_len = strlen(entry->name); - /* If there is a new replication site that we can use, ignore all the - * sites that aren't marked as new. - */ - for (i = 0; i < nr_servers; i++) { - tmp = ntohl(uvldb->serverFlags[i]); - if (!(tmp & AFS_VLSF_DONTUSE) && - (tmp & AFS_VLSF_NEWREPSITE)) - new_only = true; - } - vlflags = ntohl(uvldb->flags); for (i = 0; i < nr_servers; i++) { struct afs_uuid__xdr *xdr; struct afs_uuid *uuid; + u32 tmp = ntohl(uvldb->serverFlags[i]); int j; int n = entry->nr_servers; - tmp = ntohl(uvldb->serverFlags[i]); - if (tmp & AFS_VLSF_DONTUSE || - (new_only && !(tmp & AFS_VLSF_NEWREPSITE))) - continue; if (tmp & AFS_VLSF_RWVOL) { entry->fs_mask[n] |= AFS_VOL_VTM_RW; if (vlflags & AFS_VLF_BACKEXISTS) @@ -82,6 +68,7 @@ static int afs_deliver_vl_get_entry_by_name_u(struct afs_call *call) for (j = 0; j < 6; j++) uuid->node[j] = (u8)ntohl(xdr->node[j]); + entry->vlsf_flags[n] = tmp; entry->addr_version[n] = ntohl(uvldb->serverUnique[i]); entry->nr_servers++; } diff --git a/fs/afs/volume.c b/fs/afs/volume.c index aebdeeed450a..fd5478e792da 100644 --- a/fs/afs/volume.c +++ b/fs/afs/volume.c @@ -84,11 +84,15 @@ static struct afs_volume *afs_alloc_volume(struct afs_fs_context *params, volume->type = params->type; volume->type_force = params->force; volume->name_len = vldb->name_len; + volume->creation_time = TIME64_MIN; + volume->update_time = TIME64_MIN; refcount_set(&volume->ref, 1); INIT_HLIST_NODE(&volume->proc_link); INIT_WORK(&volume->destructor, afs_destroy_volume); rwlock_init(&volume->servers_lock); + mutex_init(&volume->volsync_lock); + mutex_init(&volume->cb_check_lock); rwlock_init(&volume->cb_v_break_lock); INIT_LIST_HEAD(&volume->open_mmaps); init_rwsem(&volume->open_mmaps_lock); @@ -388,7 +392,11 @@ static int afs_update_volume_status(struct afs_volume *volume, struct key *key) discard = old; } - volume->update_at = ktime_get_real_seconds() + afs_volume_record_life; + /* Check more often if replication is ongoing. */ + if (new->ro_replicating) + volume->update_at = ktime_get_real_seconds() + 2; + else + volume->update_at = ktime_get_real_seconds() + afs_volume_record_life; write_unlock(&volume->servers_lock); if (discard == old) diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c index 7c013070ef51..2d6943f05ea5 100644 --- a/fs/afs/yfsclient.c +++ b/fs/afs/yfsclient.c @@ -254,7 +254,6 @@ static void xdr_decode_YFSVolSync(const __be32 **_bp, update = xdr_to_u64(x->vol_update_date); do_div(update, 10 * 1000 * 1000); volsync->update = update; - volsync->mask |= AFS_VOLSYNC_CREATION | AFS_VOLSYNC_UPDATE; } *_bp += xdr_size(x); diff --git a/include/trace/events/afs.h b/include/trace/events/afs.h index ac50fa687429..b2e0847eca47 100644 --- a/include/trace/events/afs.h +++ b/include/trace/events/afs.h @@ -442,7 +442,8 @@ enum yfs_cm_operation { EM(afs_cb_break_for_unlink, "break-unlink") \ EM(afs_cb_break_for_update_regress, "update-regress") \ EM(afs_cb_break_for_volume_callback, "break-v-cb") \ - E_(afs_cb_break_for_vos_release, "break-vos-release") + EM(afs_cb_break_for_vos_release, "break-vos-release") \ + E_(afs_cb_break_volume_excluded, "vol-excluded") #define afs_rotate_traces \ EM(afs_rotate_trace_aborted, "Abortd") \