[GSoC,RFC] clone: use dir-iterator to avoid explicit dir traversal
diff mbox series

Message ID 20190213205554.4086-1-matheus.bernardino@usp.br
State New
Headers show
Series
  • [GSoC,RFC] clone: use dir-iterator to avoid explicit dir traversal
Related show

Commit Message

Matheus Tavares Bernardino Feb. 13, 2019, 8:55 p.m. UTC
Replace usage of opendir/readdir/closedir API to traverse directories
recursively, at copy_or_link_directory function, by the dir-iterator
API.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
This is my microproject for GSoC 2019. It's still a RFC because I have  
some questions. Any help will be much appreciated.                      
                                                                        
There're three places inside copy_or_link_directory's loop where        
die_errno() is called. Should I call dir_iterator_abort, at these       
places, before die_errno() is called (to free resources)?               
                                                                        
And if so, should I check dir_iterator_abort's return code? It's said at
dir-iterator.h that dir_iterator_abort returns ITER_ERROR on error, but 
by the code, we can see that it only possibly returns ITER_DONE.        
                                                                        
Finally, if this call and check both needs to be done, there'll be code 
replication in this three places. Should I add a goto and do all this    
stuff at the function end? (But then, I would have to store die_errno's 
error messages and errno code in temporary variables. Is this approach  
good?) 

 builtin/clone.c | 65 +++++++++++++++++++++++++++++--------------------
 1 file changed, 38 insertions(+), 27 deletions(-)

Comments

Christian Couder Feb. 14, 2019, 9:16 p.m. UTC | #1
On Thu, Feb 14, 2019 at 1:16 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> Replace usage of opendir/readdir/closedir API to traverse directories
> recursively, at copy_or_link_directory function, by the dir-iterator
> API.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
> This is my microproject for GSoC 2019. It's still a RFC because I have
> some questions. Any help will be much appreciated.

Thanks for working on a microproject!

> There're three places inside copy_or_link_directory's loop where
> die_errno() is called. Should I call dir_iterator_abort, at these
> places, before die_errno() is called (to free resources)?

I don't think it's necessary. We care about freeing resources when we
report errors (for example by returning an error code from inside a
function), but not when we are exiting.

> -static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> -                                  const char *src_repo, int src_baselen)
> +static void mkdir_if_missing(const char *pathname, mode_t mode)

It looks like your patch is both splitting copy_or_link_directory()
into 2 functions and converting it to use the dir-iterator API. Maybe
it would be better to have 2 patches instead, one for splitting it and
one for converting it.

>  {
> -       struct dirent *de;
> +       /*
> +        * Tries to create a dir at pathname. If pathname already exists and
> +        * is a dir, do nothing.
> +        */

I think we usually put such comments just before the function. And
maybe it could be shortened to "Create a dir at pathname unless
there's already one"
Matheus Tavares Bernardino Feb. 14, 2019, 10:03 p.m. UTC | #2
On Thu, Feb 14, 2019 at 7:16 PM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Thu, Feb 14, 2019 at 1:16 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> >
> > Replace usage of opendir/readdir/closedir API to traverse directories
> > recursively, at copy_or_link_directory function, by the dir-iterator
> > API.
> >
> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> > ---
> > This is my microproject for GSoC 2019. It's still a RFC because I have
> > some questions. Any help will be much appreciated.
>
> Thanks for working on a microproject!
>

Hi, Christian. Thank you for the review and comments.

> > There're three places inside copy_or_link_directory's loop where
> > die_errno() is called. Should I call dir_iterator_abort, at these
> > places, before die_errno() is called (to free resources)?
>
> I don't think it's necessary. We care about freeing resources when we
> report errors (for example by returning an error code from inside a
> function), but not when we are exiting.
>

Ok, thanks!

> > -static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> > -                                  const char *src_repo, int src_baselen)
> > +static void mkdir_if_missing(const char *pathname, mode_t mode)
>
> It looks like your patch is both splitting copy_or_link_directory()
> into 2 functions and converting it to use the dir-iterator API. Maybe
> it would be better to have 2 patches instead, one for splitting it and
> one for converting it.
>

Got it. As the justification for splitting the function was to use the
extracted part in the section that was previously recursive, I thought
both changes should be in the same patch. But I really was in doubt
about that. Should I split it into two patches and mention that
justification at the first one? Or just split?

> >  {
> > -       struct dirent *de;
> > +       /*
> > +        * Tries to create a dir at pathname. If pathname already exists and
> > +        * is a dir, do nothing.
> > +        */
>
> I think we usually put such comments just before the function. And
> maybe it could be shortened to "Create a dir at pathname unless
> there's already one"

Right, the shortened version does sounds much better, thanks! About
the comment placement, I followed what I saw in other functions from
the same file ("copy_alternates", for example). But also, I couldn't
find any instruction about that at Documentation/CodingGuidelines. So
should I move it as you suggested?
Christian Couder Feb. 15, 2019, 8:59 a.m. UTC | #3
On Thu, Feb 14, 2019 at 11:04 PM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> On Thu, Feb 14, 2019 at 7:16 PM Christian Couder
> <christian.couder@gmail.com> wrote:

> > > -static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> > > -                                  const char *src_repo, int src_baselen)
> > > +static void mkdir_if_missing(const char *pathname, mode_t mode)
> >
> > It looks like your patch is both splitting copy_or_link_directory()
> > into 2 functions and converting it to use the dir-iterator API. Maybe
> > it would be better to have 2 patches instead, one for splitting it and
> > one for converting it.
> >
>
> Got it. As the justification for splitting the function was to use the
> extracted part in the section that was previously recursive, I thought
> both changes should be in the same patch. But I really was in doubt
> about that. Should I split it into two patches and mention that
> justification at the first one? Or just split?

If 2 patches instead of 1 makes it easier to review and understand
what's going on, then I think you should indeed send 2 patches and
mention the justification for splitting the function in the commit
message of the first patch.

> > I think we usually put such comments just before the function. And
> > maybe it could be shortened to "Create a dir at pathname unless
> > there's already one"
>
> Right, the shortened version does sounds much better, thanks! About
> the comment placement, I followed what I saw in other functions from
> the same file ("copy_alternates", for example).

Then it's ok to place it like you did. Sorry about the noise.

> But also, I couldn't
> find any instruction about that at Documentation/CodingGuidelines. So
> should I move it as you suggested?

I think we have not been very consistent over the years. Recently I
think we have tried to add or move API documentation inside header
files, and in general before functions in the code, but yeah it might
not have been documented.

Patch
diff mbox series

diff --git a/builtin/clone.c b/builtin/clone.c
index 50bde99618..f5208ad9ca 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,8 @@ 
 #include "transport.h"
 #include "strbuf.h"
 #include "dir.h"
+#include "dir-iterator.h"
+#include "iterator.h"
 #include "sigchain.h"
 #include "branch.h"
 #include "remote.h"
@@ -392,50 +394,55 @@  static void copy_alternates(struct strbuf *src, struct strbuf *dst,
 	fclose(in);
 }
 
-static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
-				   const char *src_repo, int src_baselen)
+static void mkdir_if_missing(const char *pathname, mode_t mode)
 {
-	struct dirent *de;
+	/*
+	 * Tries to create a dir at pathname. If pathname already exists and
+	 * is a dir, do nothing.
+	 */
 	struct stat buf;
-	int src_len, dest_len;
-	DIR *dir;
 
-	dir = opendir(src->buf);
-	if (!dir)
-		die_errno(_("failed to open '%s'"), src->buf);
-
-	if (mkdir(dest->buf, 0777)) {
+	if (mkdir(pathname, mode)) {
 		if (errno != EEXIST)
-			die_errno(_("failed to create directory '%s'"), dest->buf);
-		else if (stat(dest->buf, &buf))
-			die_errno(_("failed to stat '%s'"), dest->buf);
+			die_errno(_("failed to create directory '%s'"),
+				  pathname);
+		else if (stat(pathname, &buf))
+			die_errno(_("failed to stat '%s'"), pathname);
 		else if (!S_ISDIR(buf.st_mode))
-			die(_("%s exists and is not a directory"), dest->buf);
+			die(_("%s exists and is not a directory"), pathname);
 	}
+}
+
+static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
+				   const char *src_repo, int src_baselen)
+{
+	int src_len, dest_len;
+	struct dir_iterator *iter;
+	int iter_status;
+
+	mkdir_if_missing(dest->buf, 0777);
+
+	iter = dir_iterator_begin(src->buf);
 
 	strbuf_addch(src, '/');
 	src_len = src->len;
 	strbuf_addch(dest, '/');
 	dest_len = dest->len;
 
-	while ((de = readdir(dir)) != NULL) {
+	while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) {
 		strbuf_setlen(src, src_len);
-		strbuf_addstr(src, de->d_name);
+		strbuf_addstr(src, iter->relative_path);
 		strbuf_setlen(dest, dest_len);
-		strbuf_addstr(dest, de->d_name);
-		if (stat(src->buf, &buf)) {
-			warning (_("failed to stat %s\n"), src->buf);
-			continue;
-		}
-		if (S_ISDIR(buf.st_mode)) {
-			if (de->d_name[0] != '.')
-				copy_or_link_directory(src, dest,
-						       src_repo, src_baselen);
+		strbuf_addstr(dest, iter->relative_path);
+
+		if (S_ISDIR(iter->st.st_mode)) {
+			if (iter->basename[0] != '.')
+				mkdir_if_missing(dest->buf, 0777);
 			continue;
 		}
 
 		/* Files that cannot be copied bit-for-bit... */
-		if (!strcmp(src->buf + src_baselen, "/info/alternates")) {
+		if (!strcmp(iter->relative_path, "info/alternates")) {
 			copy_alternates(src, dest, src_repo);
 			continue;
 		}
@@ -452,7 +459,11 @@  static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		if (copy_file_with_time(dest->buf, src->buf, 0666))
 			die_errno(_("failed to copy file to '%s'"), dest->buf);
 	}
-	closedir(dir);
+
+	if (iter_status != ITER_DONE) {
+		strbuf_setlen(src, src_len);
+		die(_("failed to iterate over '%s'"), src->buf);
+	}
 }
 
 static void clone_local(const char *src_repo, const char *dest_repo)