Artwork

Content provided by HPR Volunteer and Hacker Public Radio. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by HPR Volunteer and Hacker Public Radio or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

HPR4417: Newest matching file

 
Share
 

Manage episode 493222618 series 44008
Content provided by HPR Volunteer and Hacker Public Radio. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by HPR Volunteer and Hacker Public Radio or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

This show has been flagged as Explicit by the host.

Overview

Several years ago I wrote a Bash script to perform a task I need to perform almost every day - find the newest file in a series of files.

At this point I was running a camera on a Raspberry Pi which was attached to a window and viewed my back garden. I was taking a picture every 15 minutes, giving them names containing the date and time, and storing them in a directory. It was useful to be able to display the latest picture.

Since then, I have found that searching for newest files useful in many contexts:

  • Find the image generated by my random recipe chooser, put in the clipboard and send it to the Telegram channel for my family.

  • Generate a weather report from wttr.in and send it to Matrix.

  • Find the screenshot I just made and put it in the clipboard.

Of course, I could just use the same name when writing these various files, rather than accumulating several, but I often want to look back through such collections. If I am concerned about such files accumulating in an unwanted way I write cron scripts which run every day and delete the oldest ones.

Original script

The first iteration of the script was actually written as a Bash function which was loaded at login time. The function is called newest_matching_file and it takes two arguments:

  • A file glob expression to match the file I am looking for.

  • An optional directory to look for the file. If this is omitted, then the current directory will be used.

The first version of this function was a bit awkward since it used a for loop to scan the directory, using the glob pattern to find the file. Since Bash glob pattern searches will return the search pattern when they fail, it was necessary to use the nullglob (see references) option to prevent this, turning it on before the search and off afterwards.

This technique was replaced later with a pipeline using the find command.

Improved Bash script

The version using find is what I will explain here.

 function newest_matching_file { local glob_pattern=${1-} local dir=${2:-$PWD} # Argument number check if [[ $# -eq 0 || $# -gt 2 ]]; then echo 'Usage: newest_matching_file GLOB_PATTERN [DIR]' >&2 return 1 fi # Check the target directory if [[ ! -d $dir ]]; then echo "Unable to find directory $dir" >&2 return 1 fi local newest_file # shellcheck disable=SC2016 newest_file=$(find "$dir" -maxdepth 1 -name "$glob_pattern" \ -type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}') # Use printf instead of echo in case the file name begins with '-' [[ -n $newest_file ]] && printf '%s\n' "$newest_file" return 0 } 

The function is in the file newest_matching_file_1.sh , and it's loaded ("sourced", or declared) like this:

 . newest_matching_file_1.sh 

The '.' is a short-hand version of the command source .

I actually have two versions of this function, with the second one using a regular expression, which the find command is able to search with, but I prefer this one.

Explanation

  • The first two lines beginning with local define variables local to the function holding the arguments. The first, glob_pattern is expected to contain something like screenshot_2025-04-*.png . The second will hold the directory to be scanned, or if omitted, will be set to the current directory.

  • Next, an if statement checks that there are the right number of arguments, aborting if not. Note that the echo command writes to STDERR (using '>&2' ), the error channel.

  • Another if statement checks that the target directory actually exists, and aborts if not.

  • Another local variable newest_file is defined. It's good practice not to create global variables in functions since they will "leak" into the calling environment.

  • The variable newest_file is set to the result of a command substitution containing a pipeline:

    • The find command searches the target directory.
      • Using -maxdepth 1 limits the search to the chosen directory and does not descend into sub-directories.
      • The search pattern is defined by -name "$glob_pattern"
      • Using -type f limits the search to files
      • The -printf "%T@ %p\n" argument returns the file's last modification time as the number of seconds since the Unix epoch '%T@' . This is a number which is larger if the file is older. This is followed, after a space, by the full path to the file ( '%p' ), and a newline.
    • The matching file names are sorted. Because each is preceded by a numeric time value, they will be sorted in ascending order of age.
    • Finally sed is used to return the last file in the sorted list with the program '${s/.\+ //;p}' :
      • The use of the -n option ensures that only lines which are explicitly printed will be shown.
      • The sed program looks for the last line (using '$' ). When found the leading numeric time is removed with ' s/.\+ //' and the result is printed (with 'p' ).
    • The end result will either be the path to the newest file or nothing (because there was no match).
  • The expression '[[ -n $newest_file ]]' will be true if $newest_file variable is not empty, and if that is the case, the contents of the variable will be printed on STDOUT, otherwise nothing will be printed.

  • Note that the script returns 1 (false) if there is a failure, and 0 (true) if all is well. A null return is regarded as success.

Script update

While editing the audio for this show I realised that there is a flaw in the Bash function newest_matching_file . This is in the sed script used to process the output from find .

The sed commands used in the script delete all characters up to a space, assuming that this is the only space in the last line. However, if the file name itself contains spaces, this will not work because regular expressions in sed are greedy . What is deleted in this case is everything up to and including the last space.

I created a directory called tests and added the following files:

 'File 1 with spaces.txt' 'File 2 with spaces.txt' 'File 3 with spaces.txt' 

I then ran the find command as follows:

 $ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}' spaces.txt 

I adjusted the sed call to sed -ne '${s/[^ ]\+ //;p}' . This uses the regular expression:

 s/[^ ]\+ // 

This now specifies that what it to be removed is every non-space up to and including the first space. The result is:

 $ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/[^ ]\+ //;p}' tests/File 3 with spaces.txt 

This change has been propagated to the copy on GitLab .

Usage

This function is designed to be used in commands or other scripts.

For example, I have an alias defined as follows:

 alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(newest_matching_file 'Screenshot_*.png' ~/Pictures/Screenshots/)" 

This uses xclip to load the latest screenshot into the clipboard, so I can paste it into a social media client for example.

Perl alternative

During the history of this family of scripts I wrote a Perl version. This was originally because the Bash function gave problems when run under the Bourne shell, and I was using pdmenu a lot which internally runs scripts under that shell.

 #!/usr/bin/env perl use v5.40; use open ':std', ':encoding(UTF-8)'; # Make all IO UTF-8 use Cwd; use File::Find::Rule; # # Script name # ( my $PROG = $0 ) =~ s|.*/||mx; # # Use a regular expression rather than a glob pattern # my $regex = shift; # # Get the directory to search, defaulting to the current one # my $dir = shift // getcwd(); # # Have to have the regular expression # die "Usage: $PROG regex [DIR]\n" unless $regex; # # Collect all the files in the target directory without recursing. Include the # path and let the caller remove it if they want. # my @files = File::Find::Rule->file() ->name(qr/$regex/) ->maxdepth(1) ->in($dir); die "Unsuccessful search\n" unless @files; # # Sort the files by ascending modification time, youngest first # @files = sort {-M($a) <=> -M($b)} @files; # # Report the one which sorted first # say $files[0]; exit; 

Explanation

  • This is fairly straightforward Perl script, run out of an executable file with a shebang line at the start indicating what is to be used to run it - perl .

  • The preamble defines the Perl version to use, and indicates that UTF-8 (character sets like Unicode) will be acceptable for reading and writing.

  • Two modules are required:

    • Cwd : provides functions for determining the pathname of the current working directory.
    • File::Find::Rule : provides tools for searching the file system (similar to the find command, but with more features).
  • Next the variable $PROG is set to the name under which the script has been invoked. This is useful when giving a brief summary of usage.

  • The first argument is then collected (with shift ) and placed into the variable $regex .

  • The second argument is optional, but if omitted, is set to the current working directory. We see the use of shift again, but if this returns nothing (is undefined), the '//' operator invokes the getcwd() function to get the current working directory.

  • If the $regex variable is not defined, then die is called to terminate the script with an error message.

  • The search itself is invoked using File::Find::Rule and the results are added to the array @files . The multi-line call shows several methods being called in a "chain" to define the rules and invoke the search:

    • file() : sets up a file search
    • name(qr/$regex/) : a rule which applies a regular expression match to each file name, rejecting any that do not match
    • maxdepth(1) : a rule which prevents the search from descending below the top level into sub-directories
    • in($dir) : defines the directory to search (and also begins the search)
  • If the search returns no files (the array is empty), the script ends with an error message.

  • Otherwise the @files array is sorted. This is done by comparing modification times of the files, with the array being reordered such that the "youngest" (newest) file is sorted first. The <=> operator checks if the value of the left operand is greater than the value of the right operand, and if yes then the condition becomes true. This operator is most useful in the Perl sort function.

  • Finally, the newest file is reported.

Usage

This script can be used in almost the same way as the Bash variant. The difference is that the pattern used to match files is a Perl regular expression. I keep this script in my ~/bin directory, so it can be invoked just by typing its name. I also maintain a symlink called nmf to save typing!

The above example, using the Perl version, would be:

 alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(nmf 'Screenshot_.*\.png' ~/Pictures/Screenshots/)" 

In regular expressions '.*' means "any character zero or more times". The '.' in '.png' is escaped because we need an actual dot character.

Conclusion

The approach in both cases is fairly simple. Files matching a pattern are accumulated, in the Bash case including the modification time. The files are sorted by modification time and the one with the lowest time is the answer. The Bash version has to remove the modification time before printing.

This algorithm could be written in many ways. I will probably try rewriting it in other languages in the future, to see which one I think is best.

References

Provide feedback on this episode.

  continue reading

146 episodes

Artwork

HPR4417: Newest matching file

Hacker Public Radio

91 subscribers

published

iconShare
 
Manage episode 493222618 series 44008
Content provided by HPR Volunteer and Hacker Public Radio. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by HPR Volunteer and Hacker Public Radio or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

This show has been flagged as Explicit by the host.

Overview

Several years ago I wrote a Bash script to perform a task I need to perform almost every day - find the newest file in a series of files.

At this point I was running a camera on a Raspberry Pi which was attached to a window and viewed my back garden. I was taking a picture every 15 minutes, giving them names containing the date and time, and storing them in a directory. It was useful to be able to display the latest picture.

Since then, I have found that searching for newest files useful in many contexts:

  • Find the image generated by my random recipe chooser, put in the clipboard and send it to the Telegram channel for my family.

  • Generate a weather report from wttr.in and send it to Matrix.

  • Find the screenshot I just made and put it in the clipboard.

Of course, I could just use the same name when writing these various files, rather than accumulating several, but I often want to look back through such collections. If I am concerned about such files accumulating in an unwanted way I write cron scripts which run every day and delete the oldest ones.

Original script

The first iteration of the script was actually written as a Bash function which was loaded at login time. The function is called newest_matching_file and it takes two arguments:

  • A file glob expression to match the file I am looking for.

  • An optional directory to look for the file. If this is omitted, then the current directory will be used.

The first version of this function was a bit awkward since it used a for loop to scan the directory, using the glob pattern to find the file. Since Bash glob pattern searches will return the search pattern when they fail, it was necessary to use the nullglob (see references) option to prevent this, turning it on before the search and off afterwards.

This technique was replaced later with a pipeline using the find command.

Improved Bash script

The version using find is what I will explain here.

 function newest_matching_file { local glob_pattern=${1-} local dir=${2:-$PWD} # Argument number check if [[ $# -eq 0 || $# -gt 2 ]]; then echo 'Usage: newest_matching_file GLOB_PATTERN [DIR]' >&2 return 1 fi # Check the target directory if [[ ! -d $dir ]]; then echo "Unable to find directory $dir" >&2 return 1 fi local newest_file # shellcheck disable=SC2016 newest_file=$(find "$dir" -maxdepth 1 -name "$glob_pattern" \ -type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}') # Use printf instead of echo in case the file name begins with '-' [[ -n $newest_file ]] && printf '%s\n' "$newest_file" return 0 } 

The function is in the file newest_matching_file_1.sh , and it's loaded ("sourced", or declared) like this:

 . newest_matching_file_1.sh 

The '.' is a short-hand version of the command source .

I actually have two versions of this function, with the second one using a regular expression, which the find command is able to search with, but I prefer this one.

Explanation

  • The first two lines beginning with local define variables local to the function holding the arguments. The first, glob_pattern is expected to contain something like screenshot_2025-04-*.png . The second will hold the directory to be scanned, or if omitted, will be set to the current directory.

  • Next, an if statement checks that there are the right number of arguments, aborting if not. Note that the echo command writes to STDERR (using '>&2' ), the error channel.

  • Another if statement checks that the target directory actually exists, and aborts if not.

  • Another local variable newest_file is defined. It's good practice not to create global variables in functions since they will "leak" into the calling environment.

  • The variable newest_file is set to the result of a command substitution containing a pipeline:

    • The find command searches the target directory.
      • Using -maxdepth 1 limits the search to the chosen directory and does not descend into sub-directories.
      • The search pattern is defined by -name "$glob_pattern"
      • Using -type f limits the search to files
      • The -printf "%T@ %p\n" argument returns the file's last modification time as the number of seconds since the Unix epoch '%T@' . This is a number which is larger if the file is older. This is followed, after a space, by the full path to the file ( '%p' ), and a newline.
    • The matching file names are sorted. Because each is preceded by a numeric time value, they will be sorted in ascending order of age.
    • Finally sed is used to return the last file in the sorted list with the program '${s/.\+ //;p}' :
      • The use of the -n option ensures that only lines which are explicitly printed will be shown.
      • The sed program looks for the last line (using '$' ). When found the leading numeric time is removed with ' s/.\+ //' and the result is printed (with 'p' ).
    • The end result will either be the path to the newest file or nothing (because there was no match).
  • The expression '[[ -n $newest_file ]]' will be true if $newest_file variable is not empty, and if that is the case, the contents of the variable will be printed on STDOUT, otherwise nothing will be printed.

  • Note that the script returns 1 (false) if there is a failure, and 0 (true) if all is well. A null return is regarded as success.

Script update

While editing the audio for this show I realised that there is a flaw in the Bash function newest_matching_file . This is in the sed script used to process the output from find .

The sed commands used in the script delete all characters up to a space, assuming that this is the only space in the last line. However, if the file name itself contains spaces, this will not work because regular expressions in sed are greedy . What is deleted in this case is everything up to and including the last space.

I created a directory called tests and added the following files:

 'File 1 with spaces.txt' 'File 2 with spaces.txt' 'File 3 with spaces.txt' 

I then ran the find command as follows:

 $ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/.\+ //;p}' spaces.txt 

I adjusted the sed call to sed -ne '${s/[^ ]\+ //;p}' . This uses the regular expression:

 s/[^ ]\+ // 

This now specifies that what it to be removed is every non-space up to and including the first space. The result is:

 $ find tests -maxdepth 1 -name 'File*' -type f -printf "%T@ %p\n" | sort | sed -ne '${s/[^ ]\+ //;p}' tests/File 3 with spaces.txt 

This change has been propagated to the copy on GitLab .

Usage

This function is designed to be used in commands or other scripts.

For example, I have an alias defined as follows:

 alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(newest_matching_file 'Screenshot_*.png' ~/Pictures/Screenshots/)" 

This uses xclip to load the latest screenshot into the clipboard, so I can paste it into a social media client for example.

Perl alternative

During the history of this family of scripts I wrote a Perl version. This was originally because the Bash function gave problems when run under the Bourne shell, and I was using pdmenu a lot which internally runs scripts under that shell.

 #!/usr/bin/env perl use v5.40; use open ':std', ':encoding(UTF-8)'; # Make all IO UTF-8 use Cwd; use File::Find::Rule; # # Script name # ( my $PROG = $0 ) =~ s|.*/||mx; # # Use a regular expression rather than a glob pattern # my $regex = shift; # # Get the directory to search, defaulting to the current one # my $dir = shift // getcwd(); # # Have to have the regular expression # die "Usage: $PROG regex [DIR]\n" unless $regex; # # Collect all the files in the target directory without recursing. Include the # path and let the caller remove it if they want. # my @files = File::Find::Rule->file() ->name(qr/$regex/) ->maxdepth(1) ->in($dir); die "Unsuccessful search\n" unless @files; # # Sort the files by ascending modification time, youngest first # @files = sort {-M($a) <=> -M($b)} @files; # # Report the one which sorted first # say $files[0]; exit; 

Explanation

  • This is fairly straightforward Perl script, run out of an executable file with a shebang line at the start indicating what is to be used to run it - perl .

  • The preamble defines the Perl version to use, and indicates that UTF-8 (character sets like Unicode) will be acceptable for reading and writing.

  • Two modules are required:

    • Cwd : provides functions for determining the pathname of the current working directory.
    • File::Find::Rule : provides tools for searching the file system (similar to the find command, but with more features).
  • Next the variable $PROG is set to the name under which the script has been invoked. This is useful when giving a brief summary of usage.

  • The first argument is then collected (with shift ) and placed into the variable $regex .

  • The second argument is optional, but if omitted, is set to the current working directory. We see the use of shift again, but if this returns nothing (is undefined), the '//' operator invokes the getcwd() function to get the current working directory.

  • If the $regex variable is not defined, then die is called to terminate the script with an error message.

  • The search itself is invoked using File::Find::Rule and the results are added to the array @files . The multi-line call shows several methods being called in a "chain" to define the rules and invoke the search:

    • file() : sets up a file search
    • name(qr/$regex/) : a rule which applies a regular expression match to each file name, rejecting any that do not match
    • maxdepth(1) : a rule which prevents the search from descending below the top level into sub-directories
    • in($dir) : defines the directory to search (and also begins the search)
  • If the search returns no files (the array is empty), the script ends with an error message.

  • Otherwise the @files array is sorted. This is done by comparing modification times of the files, with the array being reordered such that the "youngest" (newest) file is sorted first. The <=> operator checks if the value of the left operand is greater than the value of the right operand, and if yes then the condition becomes true. This operator is most useful in the Perl sort function.

  • Finally, the newest file is reported.

Usage

This script can be used in almost the same way as the Bash variant. The difference is that the pattern used to match files is a Perl regular expression. I keep this script in my ~/bin directory, so it can be invoked just by typing its name. I also maintain a symlink called nmf to save typing!

The above example, using the Perl version, would be:

 alias copy_screenshot="xclip -selection clipboard -t image/png -i \$(nmf 'Screenshot_.*\.png' ~/Pictures/Screenshots/)" 

In regular expressions '.*' means "any character zero or more times". The '.' in '.png' is escaped because we need an actual dot character.

Conclusion

The approach in both cases is fairly simple. Files matching a pattern are accumulated, in the Bash case including the modification time. The files are sorted by modification time and the one with the lowest time is the answer. The Bash version has to remove the modification time before printing.

This algorithm could be written in many ways. I will probably try rewriting it in other languages in the future, to see which one I think is best.

References

Provide feedback on this episode.

  continue reading

146 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play