Go offline with the Player FM app!
HPR4417: Newest matching file
Manage episode 493219558 series 32765
This show has been flagged as Explicit by the host.
Overview
Several years ago I wrote a Bash script to perform a task I need to perform almost every day - find the newest file in a series of files.
At this point I was running a camera on a Raspberry Pi which was attached to a window and viewed my back garden. I was taking a picture every 15 minutes, giving them names containing the date and time, and storing them in a directory. It was useful to be able to display the latest picture.
Since then, I have found that searching for newest files useful in many contexts:
Find the image generated by my random recipe chooser, put in the clipboard and send it to the Telegram channel for my family.
Generate a weather report from
wttr.in
and send it to Matrix.Find the screenshot I just made and put it in the clipboard.
Of course, I could just use the same name when writing these various files, rather than accumulating several, but I often want to look back through such collections. If I am concerned about such files accumulating in an unwanted way I write cron
scripts which run every day and delete the oldest ones.
Original script
The first iteration of the script was actually written as a Bash function which was loaded at login time. The function is called newest_matching_file
and it takes two arguments:
A file glob expression to match the file I am looking for.
An optional directory to look for the file. If this is omitted, then the current directory will be used.
The first version of this function was a bit awkward since it used a for
loop to scan the directory, using the glob pattern to find the file. Since Bash glob pattern searches will return the search pattern when they fail, it was necessary to use the nullglob
(see references) option to prevent this, turning it on before the search and off afterwards.
This technique was replaced later with a pipeline using the find
command.
Improved Bash script
The version using find
is what I will explain here.
function newest_matching_file { local glob_pattern=${1-} local dir=${2:-$PWD} # Argument number check if [[ $# -eq 0 || $# -gt 2 ]]; then echo 'Usage: newest_matching_file GLOB_PATTERN [DIR]' >&2 return 1 fi # Check the target directory if [[ ! -d $dir ]]; then echo "Unable to find directory $dir" >&2 return 1 fi local newest_file # shellcheck disable=SC2016 newest_file=$(find "$dir" -maxdepth 1 -name "$glob_pattern" \ -type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}') # Use printf instead of echo in case the file name begins with '-' [[ -n $newest_file ]] && printf '%s\n' "$newest_file" return 0 }
The function is in the file newest_matching_file_1.sh
, and it's loaded ("sourced", or declared) like this:
. newest_matching_file_1.sh
The '.'
is a short-hand version of the command source
.
I actually have two versions of this function, with the second one using a regular expression, which the find
command is able to search with, but I prefer this one.
Explanation
The first two lines beginning with
local
define variables local to the function holding the arguments. The first,glob_pattern
is expected to contain something likescreenshot_2025-04-*.png
. The second will hold the directory to be scanned, or if omitted, will be set to the current directory.Next, an
if
statement checks that there are the right number of arguments, aborting if not. Note that theecho
command writes to STDERR (using'>&2'
), the error channel.Another
if
statement checks that the target directory actually exists, and aborts if not.Another local variable
newest_file
is defined. It's good practice not to create global variables in functions since they will "leak" into the calling environment.The variable
newest_file
is set to the result of a command substitution containing a pipeline:- The
find
command searches the target directory.- Using
-maxdepth 1
limits the search to the chosen directory and does not descend into sub-directories. - The search pattern is defined by
-name "$glob_pattern"
- Using
-type f
limits the search to files - The
-printf "%T@ %p\n"
argument returns the file's last modification time as the number of seconds since the Unix epoch'%T@'
. This is a number which is larger if the file is older. This is followed, after a space, by the full path to the file ('%p'
), and a newline.
- Using
- The matching file names are sorted. Because each is preceded by a numeric time value, they will be sorted in ascending order of age.
- Finally
sed
is used to return the last file in the sorted list with the program'${s/.\+ //;p}'
:- The use of the
-n
option ensures that only lines which are explicitly printed will be shown. - The
sed
program looks for the last line (using'$'
). When found the leading numeric time is removed with 's/.\+ //'
and the result is printed (with'p'
).
- The use of the
- The end result will either be the path to the newest file or nothing (because there was no match).
- The
The expression
'[[ -n $newest_file ]]'
will be true if$newest_file
variable is not empty, and if that is the case, the contents of the variable will be printed on STDOUT, otherwise nothing will be printed.Note that the script returns 1 (false) if there is a failure, and 0 (true) if all is well. A null return is regarded as success.
Script update
While editing the audio for this show I realised that there is a flaw in the Bash function newest_matching_file
. This is in the sed
script used to process the output from find
.
The sed
commands used in the script delete all characters up to a space, assuming that this is the only space in the last line. However, if the file name itself contains spaces, this will not work because regular expressions in sed
are greedy . What is deleted in this case is everything up to and including the last space.
I created a directory called tests
and added the following files:
'File 1 with spaces.txt' 'File 2 with spaces.txt' 'File 3 with spaces.txt'
I then ran the find
command as follows:
$ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}' spaces.txt
I adjusted the sed
call to sed -ne '${s/[^ ]\+ //;p}'
. This uses the regular expression:
s/[^ ]\+ //
This now specifies that what it to be removed is every non-space up to and including the first space. The result is:
$ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/[^ ]\+ //;p}' tests/File 3 with spaces.txt
This change has been propagated to the copy on GitLab
.
Usage
This function is designed to be used in commands or other scripts.
For example, I have an alias defined as follows:
alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(newest_matching_file 'Screenshot_*.png' ~/Pictures/Screenshots/)"
This uses xclip
to load the latest screenshot into the clipboard, so I can paste it into a social media client for example.
Perl alternative
During the history of this family of scripts I wrote a Perl version. This was originally because the Bash function gave problems when run under the Bourne shell, and I was using pdmenu
a lot which internally runs scripts under that shell.
#!/usr/bin/env perl use v5.40; use open ':std', ':encoding(UTF-8)'; # Make all IO UTF-8 use Cwd; use File::Find::Rule; # # Script name # ( my $PROG = $0 ) =~ s|.*/||mx; # # Use a regular expression rather than a glob pattern # my $regex = shift; # # Get the directory to search, defaulting to the current one # my $dir = shift // getcwd(); # # Have to have the regular expression # die "Usage: $PROG regex [DIR]\n" unless $regex; # # Collect all the files in the target directory without recursing. Include the # path and let the caller remove it if they want. # my @files = File::Find::Rule->file() ->name(qr/$regex/) ->maxdepth(1) ->in($dir); die "Unsuccessful search\n" unless @files; # # Sort the files by ascending modification time, youngest first # @files = sort {-M($a) <=> -M($b)} @files; # # Report the one which sorted first # say $files[0]; exit;
Explanation
This is fairly straightforward Perl script, run out of an executable file with a shebang line at the start indicating what is to be used to run it -
perl
.The preamble defines the Perl version to use, and indicates that
UTF-8
(character sets like Unicode) will be acceptable for reading and writing.Two modules are required:
-
Cwd
: provides functions for determining the pathname of the current working directory. -
File::Find::Rule
: provides tools for searching the file system (similar to thefind
command, but with more features).
-
Next the variable
$PROG
is set to the name under which the script has been invoked. This is useful when giving a brief summary of usage.The first argument is then collected (with
shift
) and placed into the variable$regex
.The second argument is optional, but if omitted, is set to the current working directory. We see the use of
shift
again, but if this returns nothing (is undefined), the'//'
operator invokes thegetcwd()
function to get the current working directory.If the
$regex
variable is not defined, thendie
is called to terminate the script with an error message.The search itself is invoked using
File::Find::Rule
and the results are added to the array@files
. The multi-line call shows several methods being called in a "chain" to define the rules and invoke the search:-
file()
: sets up a file search -
name(qr/$regex/)
: a rule which applies a regular expression match to each file name, rejecting any that do not match -
maxdepth(1)
: a rule which prevents the search from descending below the top level into sub-directories -
in($dir)
: defines the directory to search (and also begins the search)
-
If the search returns no files (the array is empty), the script ends with an error message.
Otherwise the
@files
array is sorted. This is done by comparing modification times of the files, with the array being reordered such that the "youngest" (newest) file is sorted first. The<=>
operator checks if the value of the left operand is greater than the value of the right operand, and if yes then the condition becomes true. This operator is most useful in the Perlsort
function.Finally, the newest file is reported.
Usage
This script can be used in almost the same way as the Bash variant. The difference is that the pattern used to match files is a Perl regular expression. I keep this script in my ~/bin
directory, so it can be invoked just by typing its name. I also maintain a symlink called nmf
to save typing!
The above example, using the Perl version, would be:
alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(nmf 'Screenshot_.*\.png' ~/Pictures/Screenshots/)"
In regular expressions '.*'
means "any character zero or more times". The '.'
in '.png'
is escaped because we need an actual dot character.
Conclusion
The approach in both cases is fairly simple. Files matching a pattern are accumulated, in the Bash case including the modification time. The files are sorted by modification time and the one with the lowest time is the answer. The Bash version has to remove the modification time before printing.
This algorithm could be written in many ways. I will probably try rewriting it in other languages in the future, to see which one I think is best.
References
- Glob expansion:
- HPR shows covering
glob
expansion:
- GitLab repository holding these files:
859 episodes
Manage episode 493219558 series 32765
This show has been flagged as Explicit by the host.
Overview
Several years ago I wrote a Bash script to perform a task I need to perform almost every day - find the newest file in a series of files.
At this point I was running a camera on a Raspberry Pi which was attached to a window and viewed my back garden. I was taking a picture every 15 minutes, giving them names containing the date and time, and storing them in a directory. It was useful to be able to display the latest picture.
Since then, I have found that searching for newest files useful in many contexts:
Find the image generated by my random recipe chooser, put in the clipboard and send it to the Telegram channel for my family.
Generate a weather report from
wttr.in
and send it to Matrix.Find the screenshot I just made and put it in the clipboard.
Of course, I could just use the same name when writing these various files, rather than accumulating several, but I often want to look back through such collections. If I am concerned about such files accumulating in an unwanted way I write cron
scripts which run every day and delete the oldest ones.
Original script
The first iteration of the script was actually written as a Bash function which was loaded at login time. The function is called newest_matching_file
and it takes two arguments:
A file glob expression to match the file I am looking for.
An optional directory to look for the file. If this is omitted, then the current directory will be used.
The first version of this function was a bit awkward since it used a for
loop to scan the directory, using the glob pattern to find the file. Since Bash glob pattern searches will return the search pattern when they fail, it was necessary to use the nullglob
(see references) option to prevent this, turning it on before the search and off afterwards.
This technique was replaced later with a pipeline using the find
command.
Improved Bash script
The version using find
is what I will explain here.
function newest_matching_file { local glob_pattern=${1-} local dir=${2:-$PWD} # Argument number check if [[ $# -eq 0 || $# -gt 2 ]]; then echo 'Usage: newest_matching_file GLOB_PATTERN [DIR]' >&2 return 1 fi # Check the target directory if [[ ! -d $dir ]]; then echo "Unable to find directory $dir" >&2 return 1 fi local newest_file # shellcheck disable=SC2016 newest_file=$(find "$dir" -maxdepth 1 -name "$glob_pattern" \ -type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}') # Use printf instead of echo in case the file name begins with '-' [[ -n $newest_file ]] && printf '%s\n' "$newest_file" return 0 }
The function is in the file newest_matching_file_1.sh
, and it's loaded ("sourced", or declared) like this:
. newest_matching_file_1.sh
The '.'
is a short-hand version of the command source
.
I actually have two versions of this function, with the second one using a regular expression, which the find
command is able to search with, but I prefer this one.
Explanation
The first two lines beginning with
local
define variables local to the function holding the arguments. The first,glob_pattern
is expected to contain something likescreenshot_2025-04-*.png
. The second will hold the directory to be scanned, or if omitted, will be set to the current directory.Next, an
if
statement checks that there are the right number of arguments, aborting if not. Note that theecho
command writes to STDERR (using'>&2'
), the error channel.Another
if
statement checks that the target directory actually exists, and aborts if not.Another local variable
newest_file
is defined. It's good practice not to create global variables in functions since they will "leak" into the calling environment.The variable
newest_file
is set to the result of a command substitution containing a pipeline:- The
find
command searches the target directory.- Using
-maxdepth 1
limits the search to the chosen directory and does not descend into sub-directories. - The search pattern is defined by
-name "$glob_pattern"
- Using
-type f
limits the search to files - The
-printf "%T@ %p\n"
argument returns the file's last modification time as the number of seconds since the Unix epoch'%T@'
. This is a number which is larger if the file is older. This is followed, after a space, by the full path to the file ('%p'
), and a newline.
- Using
- The matching file names are sorted. Because each is preceded by a numeric time value, they will be sorted in ascending order of age.
- Finally
sed
is used to return the last file in the sorted list with the program'${s/.\+ //;p}'
:- The use of the
-n
option ensures that only lines which are explicitly printed will be shown. - The
sed
program looks for the last line (using'$'
). When found the leading numeric time is removed with 's/.\+ //'
and the result is printed (with'p'
).
- The use of the
- The end result will either be the path to the newest file or nothing (because there was no match).
- The
The expression
'[[ -n $newest_file ]]'
will be true if$newest_file
variable is not empty, and if that is the case, the contents of the variable will be printed on STDOUT, otherwise nothing will be printed.Note that the script returns 1 (false) if there is a failure, and 0 (true) if all is well. A null return is regarded as success.
Script update
While editing the audio for this show I realised that there is a flaw in the Bash function newest_matching_file
. This is in the sed
script used to process the output from find
.
The sed
commands used in the script delete all characters up to a space, assuming that this is the only space in the last line. However, if the file name itself contains spaces, this will not work because regular expressions in sed
are greedy . What is deleted in this case is everything up to and including the last space.
I created a directory called tests
and added the following files:
'File 1 with spaces.txt' 'File 2 with spaces.txt' 'File 3 with spaces.txt'
I then ran the find
command as follows:
$ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}' spaces.txt
I adjusted the sed
call to sed -ne '${s/[^ ]\+ //;p}'
. This uses the regular expression:
s/[^ ]\+ //
This now specifies that what it to be removed is every non-space up to and including the first space. The result is:
$ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/[^ ]\+ //;p}' tests/File 3 with spaces.txt
This change has been propagated to the copy on GitLab
.
Usage
This function is designed to be used in commands or other scripts.
For example, I have an alias defined as follows:
alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(newest_matching_file 'Screenshot_*.png' ~/Pictures/Screenshots/)"
This uses xclip
to load the latest screenshot into the clipboard, so I can paste it into a social media client for example.
Perl alternative
During the history of this family of scripts I wrote a Perl version. This was originally because the Bash function gave problems when run under the Bourne shell, and I was using pdmenu
a lot which internally runs scripts under that shell.
#!/usr/bin/env perl use v5.40; use open ':std', ':encoding(UTF-8)'; # Make all IO UTF-8 use Cwd; use File::Find::Rule; # # Script name # ( my $PROG = $0 ) =~ s|.*/||mx; # # Use a regular expression rather than a glob pattern # my $regex = shift; # # Get the directory to search, defaulting to the current one # my $dir = shift // getcwd(); # # Have to have the regular expression # die "Usage: $PROG regex [DIR]\n" unless $regex; # # Collect all the files in the target directory without recursing. Include the # path and let the caller remove it if they want. # my @files = File::Find::Rule->file() ->name(qr/$regex/) ->maxdepth(1) ->in($dir); die "Unsuccessful search\n" unless @files; # # Sort the files by ascending modification time, youngest first # @files = sort {-M($a) <=> -M($b)} @files; # # Report the one which sorted first # say $files[0]; exit;
Explanation
This is fairly straightforward Perl script, run out of an executable file with a shebang line at the start indicating what is to be used to run it -
perl
.The preamble defines the Perl version to use, and indicates that
UTF-8
(character sets like Unicode) will be acceptable for reading and writing.Two modules are required:
-
Cwd
: provides functions for determining the pathname of the current working directory. -
File::Find::Rule
: provides tools for searching the file system (similar to thefind
command, but with more features).
-
Next the variable
$PROG
is set to the name under which the script has been invoked. This is useful when giving a brief summary of usage.The first argument is then collected (with
shift
) and placed into the variable$regex
.The second argument is optional, but if omitted, is set to the current working directory. We see the use of
shift
again, but if this returns nothing (is undefined), the'//'
operator invokes thegetcwd()
function to get the current working directory.If the
$regex
variable is not defined, thendie
is called to terminate the script with an error message.The search itself is invoked using
File::Find::Rule
and the results are added to the array@files
. The multi-line call shows several methods being called in a "chain" to define the rules and invoke the search:-
file()
: sets up a file search -
name(qr/$regex/)
: a rule which applies a regular expression match to each file name, rejecting any that do not match -
maxdepth(1)
: a rule which prevents the search from descending below the top level into sub-directories -
in($dir)
: defines the directory to search (and also begins the search)
-
If the search returns no files (the array is empty), the script ends with an error message.
Otherwise the
@files
array is sorted. This is done by comparing modification times of the files, with the array being reordered such that the "youngest" (newest) file is sorted first. The<=>
operator checks if the value of the left operand is greater than the value of the right operand, and if yes then the condition becomes true. This operator is most useful in the Perlsort
function.Finally, the newest file is reported.
Usage
This script can be used in almost the same way as the Bash variant. The difference is that the pattern used to match files is a Perl regular expression. I keep this script in my ~/bin
directory, so it can be invoked just by typing its name. I also maintain a symlink called nmf
to save typing!
The above example, using the Perl version, would be:
alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(nmf 'Screenshot_.*\.png' ~/Pictures/Screenshots/)"
In regular expressions '.*'
means "any character zero or more times". The '.'
in '.png'
is escaped because we need an actual dot character.
Conclusion
The approach in both cases is fairly simple. Files matching a pattern are accumulated, in the Bash case including the modification time. The files are sorted by modification time and the one with the lowest time is the answer. The Bash version has to remove the modification time before printing.
This algorithm could be written in many ways. I will probably try rewriting it in other languages in the future, to see which one I think is best.
References
- Glob expansion:
- HPR shows covering
glob
expansion:
- GitLab repository holding these files:
859 episodes
All episodes
×Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.