четверг, 2 октября 2014 г.

Mind the spaces

Often developers forget about possible spaces in file and directory names, which can lead to any kind of trouble from corrupted awk parsing to terrible consequences like removing important stuff. We'll consider the latter issue. E.g. when we find files we often need to perform some operations on them. So, this command
find . -name "*.log" | xargs rm -f
will remove all log files recursively...or maybe won't? The case is that 'find' produces output of one file (row) at a time and if the file path contains spaces it will be split by shell into several 'files' which 'rm' will try to delete. Consider the following project structure:
--> tree
.
├── test
├── testdir
│   └── remove.log
└── test dir2
    └── remove2.log

2 directories, 3 files
So lets remove logs:
--> find . -name "*.log" | xargs rm -f
and see what we have now:
--> tree
.
├── testdir
└── test dir2
    └── remove2.log

2 directories, 1 file
Oooops.... Where is the 'test' file? And why the 'remove2.log' is still here? Let's see what happens. I'll revert back to the original structure before the 'rm' and we'll run harmless 'ls' to see what xargs is getting:
--> find . -name "*.log"              
./testdir/remove.log
./test dir2/remove2.log
--> find . -name "*.log" | xargs ls -l      
ls: cannot access dir2/remove2.log: No such file or directory
-rw-rw-r-- 1 aikikode aikikode  9 Oct  2 16:17 ./test
-rw-rw-r-- 1 aikikode aikikode 13 Oct  2 16:18 ./testdir/remove.log
So shell is passing 3 files:

  • ./testdir/remove.log
  • ./test
  • dir2/remove2.log
instead of original two. And all because of spaces in the directory name. And it happens (actually it was intentional) so that we have a file 'test' with same name as 'test dir2' directory first name part if split by space. That's why we get wrong files as a result of 'find'. There're 2 options here:
  1. set IFS variable before executing the command:
    IFS="$(printf '\n\t')"
    you should be aware that syntax and usage of IFS might differ from shell you are using
  2. Use -exec option in find:
    --> tree
    .
    ├── test
    ├── testdir
    │   └── remove.log
    └── test dir2
        └── remove2.log
    
    2 directories, 3 files
    --> find . -name "*.log" -exec rm -f {} \;
    --> tree
    .
    ├── test
    ├── testdir
    └── test dir2
    
    2 directories, 1 file