diff mbox series

[v2,05/11] object-store: allow threaded access to object reading

Message ID 4c5652ab34f0989856aba919ca84b2b091dcad98.1569808052.git.matheus.bernardino@usp.br (mailing list archive)
State New, archived
Headers show
Series grep: improve threading and fix race conditions | expand

Commit Message

Matheus Tavares Sept. 30, 2019, 1:50 a.m. UTC
Allow object reading to be performed by multiple threads protecting it
with an internal lock. The lock usage can be toggled with
enable_obj_read_lock() and disable_obj_read_lock(). Currently, the
functions which can be safely called in parallel are:
read_object_file_extended(), repo_read_object_file(),
read_object_file(), read_object_with_reference(), read_object(),
oid_object_info() and oid_object_info_extended(). It's also possible to
use obj_read_lock() and obj_read_unlock() to protect other sections that
cannot execute in parallel with object reading.

Probably there are many spots in the functions listed above that could
be executed unlocked (and thus, in parallel). But, for now, we are most
interested in allowing parallel access to zlib inflation. This is one of
the sections where object reading spends most of the time and it's
already thread-safe. So, to take advantage of that, the respective lock
is released when calling git_inflate() and re-acquired right after, for
every calling spot in oid_object_info_extended()'s call chain. We may
refine the lock to also exploit other possible parallel spots in the
future, but threaded zlib inflation should already give great speedups
for now.

Note that add_delta_base_cache() was also modified to skip adding
already present entries to the cache. This wasn't possible before, but
now it is since phase I and phase III of unpack_entry() may execute
concurrently.

Another important thing to notice is that the object reading lock only
works in conjunction with the 'struct raw_object_store's replace_mutex.
Otherwise, there would still be racy spots in object reading
functions.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 object-store.h | 35 +++++++++++++++++++++++++++++++
 packfile.c     |  7 +++++++
 sha1-file.c    | 57 +++++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 94 insertions(+), 5 deletions(-)

Comments

Jonathan Tan Nov. 12, 2019, 2:54 a.m. UTC | #1
> @@ -1580,7 +1585,9 @@ static void *unpack_compressed_entry(struct packed_git *p,
>  	do {
>  		in = use_pack(p, w_curs, curpos, &stream.avail_in);
>  		stream.next_in = in;
> +		obj_read_unlock();
>  		st = git_inflate(&stream, Z_FINISH);
> +		obj_read_lock();
>  		if (!stream.avail_out)
>  			break; /* the payload is larger than it should be */
>  		curpos += stream.next_in - in;

As I see it, the main purpose of this patch set is to move the mutex
guarding object reading from builtin/grep.c (grep_read_mutex) to
object-store.h (obj_read_mutex), so that we can add "holes" (non-mutex
sections) such as the one quoted above, in order that zlib inflation can
happen outside the mutex.

My concern is that the presence of these "holes" make object reading
non-thread-safe, defeating the purpose of obj_read_mutex. In particular,
the section quoted above assumes that the window section returned by
use_pack() is still valid throughout the inflation, but that window
could have been invalidated by things like an excess of windows open,
reprepare_packed_git(), etc.

I thought of this for a while but couldn't think of a good solution. If
we introduced a reference counting mechanism into Git, that would allow
us to hold the window open outside the mutex, but things like
reprepare_packed_git() would still be difficult.

If there's a good solution that only works for unpack_compressed_entry()
and not the other parts that also inflate, that might be sufficient (we
can inflate outside mutex just for this function, and inflate inside
mutex for the rest) - we would have to benchmark to be sure, but I think
that Matheus's main use case concerns grepping over a repo with a
packfile, not loose objects.
Jeff King Nov. 13, 2019, 5:20 a.m. UTC | #2
On Mon, Nov 11, 2019 at 06:54:18PM -0800, Jonathan Tan wrote:

> > @@ -1580,7 +1585,9 @@ static void *unpack_compressed_entry(struct packed_git *p,
> >  	do {
> >  		in = use_pack(p, w_curs, curpos, &stream.avail_in);
> >  		stream.next_in = in;
> > +		obj_read_unlock();
> >  		st = git_inflate(&stream, Z_FINISH);
> > +		obj_read_lock();
> >  		if (!stream.avail_out)
> >  			break; /* the payload is larger than it should be */
> >  		curpos += stream.next_in - in;
> 
> As I see it, the main purpose of this patch set is to move the mutex
> guarding object reading from builtin/grep.c (grep_read_mutex) to
> object-store.h (obj_read_mutex), so that we can add "holes" (non-mutex
> sections) such as the one quoted above, in order that zlib inflation can
> happen outside the mutex.
> 
> My concern is that the presence of these "holes" make object reading
> non-thread-safe, defeating the purpose of obj_read_mutex. In particular,
> the section quoted above assumes that the window section returned by
> use_pack() is still valid throughout the inflation, but that window
> could have been invalidated by things like an excess of windows open,
> reprepare_packed_git(), etc.

Yeah, I don't think the code above is safe. The map window can be
modified by other threads.

> I thought of this for a while but couldn't think of a good solution. If
> we introduced a reference counting mechanism into Git, that would allow
> us to hold the window open outside the mutex, but things like
> reprepare_packed_git() would still be difficult.

I think you could put a reader-writer lock into each window. The code
here would take the reader lock, and multiple readers could use it at
the same time. Any time the window needs to be shifted, resized, or
discarded, that code would take the writer lock, waiting for (and then
blocking out) any readers.

A pthread_rwlock would work, but it would be the first use in Git. I
think we'd need to find an equivalent for compat/win32/pthread.h.

-Peff
Matheus Tavares Nov. 14, 2019, 5:57 a.m. UTC | #3
Hi, Peff and Jonathan

On Wed, Nov 13, 2019 at 2:20 AM Jeff King <peff@peff.net> wrote:
>
> On Mon, Nov 11, 2019 at 06:54:18PM -0800, Jonathan Tan wrote:
[...]
> > My concern is that the presence of these "holes" make object reading
> > non-thread-safe, defeating the purpose of obj_read_mutex. In particular,
> > the section quoted above assumes that the window section returned by
> > use_pack() is still valid throughout the inflation, but that window
> > could have been invalidated by things like an excess of windows open,
> > reprepare_packed_git(), etc.

Thank you for spotting this issue!

> > I thought of this for a while but couldn't think of a good solution. If
> > we introduced a reference counting mechanism into Git, that would allow
> > us to hold the window open outside the mutex, but things like
> > reprepare_packed_git() would still be difficult.
>
> I think you could put a reader-writer lock into each window. The code
> here would take the reader lock, and multiple readers could use it at
> the same time. Any time the window needs to be shifted, resized, or
> discarded, that code would take the writer lock, waiting for (and then
> blocking out) any readers.

Great idea, I'll work on that. Thanks!

About the other parallel inflation calls on loose objects at
unpack_loose_short_header(), unpack_loose_header_to_strbuf() and
unpack_loose_rest(): could they suffer from a similar race problem?
FWIU, the inflation input used in these cases comes from
map_loose_object(), and it's not referenced outside this scope. So I
think there's no risk of one thread munmapping the object file while
other is inflating it. Is that right?

> A pthread_rwlock would work, but it would be the first use in Git. I
> think we'd need to find an equivalent for compat/win32/pthread.h.

These[1][2] seems to be the equivalent options on Windows. I'll have
to read these docs more carefully, but [2] seems to be more
interesting in terms of speed. Also, the extra features of [1] are not
really needed for our use case here.

[1]: https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/reader-writer-spin-locks
[2]: https://docs.microsoft.com/en-us/windows/win32/sync/slim-reader-writer--srw--locks
Jeff King Nov. 14, 2019, 6:01 a.m. UTC | #4
On Thu, Nov 14, 2019 at 02:57:42AM -0300, Matheus Tavares Bernardino wrote:

> About the other parallel inflation calls on loose objects at
> unpack_loose_short_header(), unpack_loose_header_to_strbuf() and
> unpack_loose_rest(): could they suffer from a similar race problem?
> FWIU, the inflation input used in these cases comes from
> map_loose_object(), and it's not referenced outside this scope. So I
> think there's no risk of one thread munmapping the object file while
> other is inflating it. Is that right?

Right, I think loose objects would be fine, because the mmap'd content
is local to that stack variable.

> > A pthread_rwlock would work, but it would be the first use in Git. I
> > think we'd need to find an equivalent for compat/win32/pthread.h.
> 
> These[1][2] seems to be the equivalent options on Windows. I'll have
> to read these docs more carefully, but [2] seems to be more
> interesting in terms of speed. Also, the extra features of [1] are not
> really needed for our use case here.
> 
> [1]: https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/reader-writer-spin-locks
> [2]: https://docs.microsoft.com/en-us/windows/win32/sync/slim-reader-writer--srw--locks

Yeah, looks like it, but I don't have any expertise there (nor a Windows
system to test on).

-Peff
Jonathan Tan Nov. 14, 2019, 6:15 p.m. UTC | #5
> > > A pthread_rwlock would work, but it would be the first use in Git. I
> > > think we'd need to find an equivalent for compat/win32/pthread.h.
> > 
> > These[1][2] seems to be the equivalent options on Windows. I'll have
> > to read these docs more carefully, but [2] seems to be more
> > interesting in terms of speed. Also, the extra features of [1] are not
> > really needed for our use case here.
> > 
> > [1]: https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/reader-writer-spin-locks
> > [2]: https://docs.microsoft.com/en-us/windows/win32/sync/slim-reader-writer--srw--locks
> 
> Yeah, looks like it, but I don't have any expertise there (nor a Windows
> system to test on).

One thing to note is that that if we do this, we'll be having one rwlock
per pack window. I couldn't find out what the Windows limits were, but
it seems that pthreads does not mandate having no limit [1]:

> Defining symbols for the maximum number of mutexes and condition
> variables was considered but rejected because the number of these
> objects may change dynamically. Furthermore, many implementations
> place these objects into application memory; thus, there is no
> explicit maximum.

[1] https://linux.die.net/man/3/pthread_mutex_init
Jeff King Nov. 15, 2019, 4:12 a.m. UTC | #6
On Thu, Nov 14, 2019 at 10:15:52AM -0800, Jonathan Tan wrote:

> > > > A pthread_rwlock would work, but it would be the first use in Git. I
> > > > think we'd need to find an equivalent for compat/win32/pthread.h.
> > > 
> > > These[1][2] seems to be the equivalent options on Windows. I'll have
> > > to read these docs more carefully, but [2] seems to be more
> > > interesting in terms of speed. Also, the extra features of [1] are not
> > > really needed for our use case here.
> > > 
> > > [1]: https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/reader-writer-spin-locks
> > > [2]: https://docs.microsoft.com/en-us/windows/win32/sync/slim-reader-writer--srw--locks
> > 
> > Yeah, looks like it, but I don't have any expertise there (nor a Windows
> > system to test on).
> 
> One thing to note is that that if we do this, we'll be having one rwlock
> per pack window. I couldn't find out what the Windows limits were, but
> it seems that pthreads does not mandate having no limit [1]:

Yeah, interesting point. We shouldn't _usually_ have too many windows at
once, but I think this would be the first place where we allocate a
non-constant number of thread mechanisms. And there are degenerate cases
where you might have tens of thousands of packs.

I suspect it's a non-issue in practice, though. Any limits there are
likely related to kernel resources like descriptors. And those
degenerate cases already run into issues there (did you know that Linux
systems limit the number of mmaps a single process can have? I didn't
until I tried repacking a repo with 35,000 packs).

-Peff
Matheus Tavares Dec. 19, 2019, 10:27 p.m. UTC | #7
Hi, Peff and Jonathan

Sorry for the delay in re-rolling this series, it was a very busy end
of semester. But I finally completed my degree and had time to get
back to it :)

I tried the rwlock approach, but there were some subtle difficulties.
For example, we should hold the lock in write mode when free()-ing the
window, and thus, the lock couldn't be declared inside the struct
pack_window.

Also, it seemed that protecting the window reading at git_inflate()
wouldn't be enough: suppose a thread has just read from a window in
git_inflate() (holding the rwlock) and is now waiting for the
obj_read_mutex to continue its object reading operation. If another
thread (that already has the obj_read_mutex) acquires the rwlock in
sequence, it could free() the said window. It might not sound like a
problem since the first thread has already finished reading from it.
But since a pointer to the window would still be in the first thread's
stack as a window cursor, it could be later passed down to use_pack()
leading to a segfault. I couldn't come up with a solution for this
yet.

However, re-inspecting the code, it seemed to me that we might already
have a thread-safe mechanism. The window disposal operations (at
close_pack_windows() and unuse_one_window()) are only performed if
window.inuse_cnt == 0. So as a thread which reads from the window will
also previously increment its inuse_cnt, wouldn't the reading
operation be already protected?

Another concern would be close_pack_fd(), which can close packs even
with in-use windows. However, as the mmap docs[1] says: "closing the
file descriptor does not unmap the region".

Finally, we also considered reprepare_packed_git() as a possible
conflicting operation. But the function called by it to handle
packfile opening, prepare_pack(), won't reopen already available
packs. Therefore, IIUC, it will leave the opened windows intact.

So, aren't perhaps the window readings performed outside the
obj_read_mutex critical section already thread-safe?

Thanks,
Matheus

[1]: https://linux.die.net/man/2/mmap
Matheus Tavares Jan. 9, 2020, 10:02 p.m. UTC | #8
Hi, folks

On Thu, Dec 19, 2019 at 5:27 PM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
[...]
> However, re-inspecting the code, it seemed to me that we might already
> have a thread-safe mechanism. The window disposal operations (at
> close_pack_windows() and unuse_one_window()) are only performed if
> window.inuse_cnt == 0. So as a thread which reads from the window will
> also previously increment its inuse_cnt, wouldn't the reading
> operation be already protected?
>
> Another concern would be close_pack_fd(), which can close packs even
> with in-use windows. However, as the mmap docs[1] says: "closing the
> file descriptor does not unmap the region".
>
> Finally, we also considered reprepare_packed_git() as a possible
> conflicting operation. But the function called by it to handle
> packfile opening, prepare_pack(), won't reopen already available
> packs. Therefore, IIUC, it will leave the opened windows intact.
>
> So, aren't perhaps the window readings performed outside the
> obj_read_mutex critical section already thread-safe?

Any thoughts on this?

> Thanks,
> Matheus
>
> [1]: https://linux.die.net/man/2/mmap
Christian Couder Jan. 10, 2020, 7:07 p.m. UTC | #9
Hi Matheus,

On Thu, Jan 9, 2020 at 11:02 PM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> On Thu, Dec 19, 2019 at 5:27 PM Matheus Tavares Bernardino
> <matheus.bernardino@usp.br> wrote:

> > However, re-inspecting the code, it seemed to me that we might already
> > have a thread-safe mechanism. The window disposal operations (at
> > close_pack_windows() and unuse_one_window()) are only performed if
> > window.inuse_cnt == 0. So as a thread which reads from the window will
> > also previously increment its inuse_cnt, wouldn't the reading
> > operation be already protected?
> >
> > Another concern would be close_pack_fd(), which can close packs even
> > with in-use windows. However, as the mmap docs[1] says: "closing the
> > file descriptor does not unmap the region".
> >
> > Finally, we also considered reprepare_packed_git() as a possible
> > conflicting operation. But the function called by it to handle
> > packfile opening, prepare_pack(), won't reopen already available
> > packs. Therefore, IIUC, it will leave the opened windows intact.
> >
> > So, aren't perhaps the window readings performed outside the
> > obj_read_mutex critical section already thread-safe?
>
> Any thoughts on this?

My advice, if you don't hear from anyone in the next few days, is to
update the commit message, the cover letter and the comments inside
the patches with the information you researched and to resubmit a new
version of the patch series.

Thanks,
Christian.
diff mbox series

Patch

diff --git a/object-store.h b/object-store.h
index b22e20ad7d..8f63f21ad2 100644
--- a/object-store.h
+++ b/object-store.h
@@ -6,6 +6,7 @@ 
 #include "list.h"
 #include "sha1-array.h"
 #include "strbuf.h"
+#include "thread-utils.h"
 
 struct object_directory {
 	struct object_directory *next;
@@ -230,6 +231,40 @@  int has_loose_object_nonlocal(const struct object_id *);
 
 void assert_oid_type(const struct object_id *oid, enum object_type expect);
 
+/*
+ * Enabling the object read lock allows multiple threads to safely call the
+ * following functions in parallel: repo_read_object_file(), read_object_file(),
+ * read_object_file_extended(), read_object_with_reference(), read_object(),
+ * oid_object_info() and oid_object_info_extended().
+ *
+ * obj_read_lock() and obj_read_unlock() may also be used to protect other
+ * section which cannot execute in parallel with object reading. Since the used
+ * lock is a recursive mutex, these sections can even contain calls to object
+ * reading functions. However, beware that in these cases zlib inflation won't
+ * be performed in parallel, losing performance.
+ *
+ * TODO: oid_object_info_extended()'s call stack has a recursive behavior. If
+ * any of its callees end up calling it, this recursive call won't benefit from
+ * parallel inflation.
+ */
+void enable_obj_read_lock(void);
+void disable_obj_read_lock(void);
+
+extern int obj_read_use_lock;
+extern pthread_mutex_t obj_read_mutex;
+
+static inline void obj_read_lock(void)
+{
+	if(obj_read_use_lock)
+		pthread_mutex_lock(&obj_read_mutex);
+}
+
+static inline void obj_read_unlock(void)
+{
+	if(obj_read_use_lock)
+		pthread_mutex_unlock(&obj_read_mutex);
+}
+
 struct object_info {
 	/* Request */
 	enum object_type *typep;
diff --git a/packfile.c b/packfile.c
index 1a7d69fe32..a336972174 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1098,7 +1098,9 @@  unsigned long get_size_from_delta(struct packed_git *p,
 	do {
 		in = use_pack(p, w_curs, curpos, &stream.avail_in);
 		stream.next_in = in;
+		obj_read_unlock();
 		st = git_inflate(&stream, Z_FINISH);
+		obj_read_lock();
 		curpos += stream.next_in - in;
 	} while ((st == Z_OK || st == Z_BUF_ERROR) &&
 		 stream.total_out < sizeof(delta_head));
@@ -1451,6 +1453,9 @@  static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
 	struct delta_base_cache_entry *ent = xmalloc(sizeof(*ent));
 	struct list_head *lru, *tmp;
 
+	if (get_delta_base_cache_entry(p, base_offset))
+		return;
+
 	delta_base_cached += base_size;
 
 	list_for_each_safe(lru, tmp, &delta_base_cache_lru) {
@@ -1580,7 +1585,9 @@  static void *unpack_compressed_entry(struct packed_git *p,
 	do {
 		in = use_pack(p, w_curs, curpos, &stream.avail_in);
 		stream.next_in = in;
+		obj_read_unlock();
 		st = git_inflate(&stream, Z_FINISH);
+		obj_read_lock();
 		if (!stream.avail_out)
 			break; /* the payload is larger than it should be */
 		curpos += stream.next_in - in;
diff --git a/sha1-file.c b/sha1-file.c
index e85f249a5d..b4f2f5cb94 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1148,6 +1148,8 @@  static int unpack_loose_short_header(git_zstream *stream,
 				     unsigned char *map, unsigned long mapsize,
 				     void *buffer, unsigned long bufsiz)
 {
+	int ret;
+
 	/* Get the data stream */
 	memset(stream, 0, sizeof(*stream));
 	stream->next_in = map;
@@ -1156,7 +1158,11 @@  static int unpack_loose_short_header(git_zstream *stream,
 	stream->avail_out = bufsiz;
 
 	git_inflate_init(stream);
-	return git_inflate(stream, 0);
+	obj_read_unlock();
+	ret = git_inflate(stream, 0);
+	obj_read_lock();
+
+	return ret;
 }
 
 int unpack_loose_header(git_zstream *stream,
@@ -1201,7 +1207,9 @@  static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map
 	stream->avail_out = bufsiz;
 
 	do {
+		obj_read_unlock();
 		status = git_inflate(stream, 0);
+		obj_read_lock();
 		strbuf_add(header, buffer, stream->next_out - (unsigned char *)buffer);
 		if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
 			return 0;
@@ -1241,8 +1249,11 @@  static void *unpack_loose_rest(git_zstream *stream,
 		 */
 		stream->next_out = buf + bytes;
 		stream->avail_out = size - bytes;
-		while (status == Z_OK)
+		while (status == Z_OK) {
+			obj_read_unlock();
 			status = git_inflate(stream, Z_FINISH);
+			obj_read_lock();
+		}
 	}
 	if (status == Z_STREAM_END && !stream->avail_in) {
 		git_inflate_end(stream);
@@ -1412,10 +1423,32 @@  static int loose_object_info(struct repository *r,
 	return (status < 0) ? status : 0;
 }
 
+int obj_read_use_lock = 0;
+pthread_mutex_t obj_read_mutex;
+
+void enable_obj_read_lock(void)
+{
+	if (obj_read_use_lock)
+		return;
+
+	obj_read_use_lock = 1;
+	init_recursive_mutex(&obj_read_mutex);
+}
+
+void disable_obj_read_lock(void)
+{
+	if (!obj_read_use_lock)
+		return;
+
+	obj_read_use_lock = 0;
+	pthread_mutex_destroy(&obj_read_mutex);
+}
+
 int fetch_if_missing = 1;
 
-int oid_object_info_extended(struct repository *r, const struct object_id *oid,
-			     struct object_info *oi, unsigned flags)
+static int do_oid_object_info_extended(struct repository *r,
+				       const struct object_id *oid,
+				       struct object_info *oi, unsigned flags)
 {
 	static struct object_info blank_oi = OBJECT_INFO_INIT;
 	struct pack_entry e;
@@ -1423,6 +1456,7 @@  int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 	const struct object_id *real = oid;
 	int already_retried = 0;
 
+
 	if (flags & OBJECT_INFO_LOOKUP_REPLACE)
 		real = lookup_replace_object(r, oid);
 
@@ -1498,7 +1532,7 @@  int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 	rtype = packed_object_info(r, e.p, e.offset, oi);
 	if (rtype < 0) {
 		mark_bad_packed_object(e.p, real->hash);
-		return oid_object_info_extended(r, real, oi, 0);
+		return do_oid_object_info_extended(r, real, oi, 0);
 	} else if (oi->whence == OI_PACKED) {
 		oi->u.packed.offset = e.offset;
 		oi->u.packed.pack = e.p;
@@ -1509,6 +1543,17 @@  int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 	return 0;
 }
 
+int oid_object_info_extended(struct repository *r, const struct object_id *oid,
+			     struct object_info *oi, unsigned flags)
+{
+	int ret;
+	obj_read_lock();
+	ret = do_oid_object_info_extended(r, oid, oi, flags);
+	obj_read_unlock();
+	return ret;
+}
+
+
 /* returns enum object_type or negative */
 int oid_object_info(struct repository *r,
 		    const struct object_id *oid,
@@ -1581,6 +1626,7 @@  void *read_object_file_extended(struct repository *r,
 	if (data)
 		return data;
 
+	obj_read_lock();
 	if (errno && errno != ENOENT)
 		die_errno(_("failed to read object %s"), oid_to_hex(oid));
 
@@ -1596,6 +1642,7 @@  void *read_object_file_extended(struct repository *r,
 	if ((p = has_packed_and_bad(r, repl->hash)) != NULL)
 		die(_("packed object %s (stored in %s) is corrupt"),
 		    oid_to_hex(repl), p->pack_name);
+	obj_read_unlock();
 
 	return NULL;
 }