Recursively remove spaces/special characters from files/dirs

I have often found the need to remove space or any special characters from files/dirs present in a bunch of directories and its sub directories. Here is how I do it

First cd to the base folder and then

e='s/[ #,]/-/g'; find . | while read f;do c=`basename "$f"`; b=`echo ${f%$c} | sed "$e"`; rename -v "$e" "$b`basename "$f"`"; done

Whats happening here ? Lets look at it step by step

e='s/[ #,]/-/g';

The above is the regular expression which is used for renaming files. The above expression will replace characters space,"#" and "," with the character "-".

find . | while read f;

The above just lists every file/directory in the current directory and all its subdirectories and is read into the variable "$f" one by one

c=`basename "$f"`;

Here we just extract the last part of the filename from the full name. For example, if we had a file/dir like "a a/b b/c c" and run basename on it, it will return "c c".
Now comes the interesting part,

b=`echo ${f%$c} | sed "$e"`;

${f%$c} will remove the last part of the filename and keep only directory part of the file. So if we had "a a/b b/c c", we would get "a a/b b/". After that this result is piped to sed and the same previous regex is applied. So the effective output will be "a-a/b-b/c c".
Now the only part left

rename -v "$e" "$b`basename "$f"`"

The above calls the rename function applying the same regex but whats interesting is that the file to rename is "$b`basename "$f"`" which from our above example becomes "a-a/b-b/c c".

But why do we need to do this ? This will be much clearer with an example

cd /tmp; mkdir -p "tp/a a/b b/c c"; cd tp
find . 

This is what the output will look like

.
./a a
./a a/b b
./a a/b b/c c

Now, this output gets into variable $f line by line.
Line 1: Skip it, its of no importance.
Line 2: is "./a a". This will get renamed to "a-a".
Line 3: we have "./a a/b b". But notice, that this path does not exist anymore because in the previous line we just renamed "./a a" to "./a-a", hence the actual filename should be "./a-a/b b". This in turn will now get renamed to "./a-a/b-b".
Line 4: is "./a a/b b/c c". Again, this path does not exist, as both "a a" and "b b" have been renamed to "a-a" and "b-b" respectively. So the correct path should be "./a-a/b-b/c c".
Hence we need to break the full path into directory and file path, apply the regex and cleanup the directory path as all previous directories will have been already renamed and then construct the full filename again.
Quite tricky!

Comments

Post new comment

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
12 + 2 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.