91 subscribers
Go offline with the Player FM app!
HPR4407: A 're-response' Bash script
Manage episode 490445528 series 44008
This show has been flagged as Explicit by the host.
Introduction
On 2025-06-19 Ken Fallon did a show, number 4404 , responding to Kevie's show 4398 , which came out on 2025-06-11.
Kevie was using a Bash pipeline to find the latest episode in an RSS feed, and download it. He used grep
to parse the XML of the feed.
Ken's response was to suggest the use of xmlstarlet
to parse the XML because such a complex structured format as XML cannot reliably be parsed without a program that "understands" the intricacies of the format's structure. The same applies to other complex formats such as HTML, YAML and JSON.
In his show Ken presented a Bash script which dealt with this problem and that of the ordering of episodes in the feed. He asked how others would write such a script, and thus I was motivated to produce this response to his response!
Alternative script
My script is a remodelling of Ken's, not a completely different solution. It contains a few alternative ways of doing what Ken did, and a reordering of the parts of his original. We will examine the changes in this episode.
Script
#!/bin/bash # Original (c) CC-0 Ken Fallon 2025 # Modified by Dave Morriss, 2025-06-14 (c) CC-0 podcast="https://tuxjam.otherside.network/feed/podcast/" # [1] while read -r item do # [2] pubDate="${item%;*}" # [3] pubDate="$( \date --date="${pubDate}" --universal +%FT%T )" # [4] url="${item#*;}" # [5] echo "${pubDate};${url}" done < <(curl --silent "${podcast}" | \ xmlstarlet sel --text --template --match 'rss/channel/item' \ --value-of 'concat(pubDate, ";", enclosure/@url)' --nl - ) | \ sort --numeric-sort --reverse | \ head -1 | \ cut -f2 -d';' | wget --quiet --input-file=- # [6]
I have placed some comments in the script in the form of '# [1]'
and I'll refer to these as I describe the changes in the following numbered list.
Note: I checked, and the script will run with the comments, though they are only there to make it easier to refer to things.
The format of the pipeline is different. It starts by defining a
while
loop, but the data which theread
command receives comes from a process substitution of the form'<(statements)'
(see the process substitution section of "hpr2045 :: Some other Bash tips" ). I have arranged the pipeline in this way because it's bad practice to place awhile
in a pipeline, as discussed in the show: hpr3985 :: Bash snippet - be careful when feeding data to loops .
(I added-r
to theread
becauseshellcheck
, which I run in thevim
editor, nagged me!)The lines coming from the process substitution are from running
curl
to collect the feed, then usingxmlstarlet
to pick out thepubDate
field of the item, and theurl
attribute of theenclosure
field returning them as two strings separated by a semicolon (';'
). This is from Ken's original code. Each line is read into the variableitem
, and the first element (before the semicolon) is extracted with the Bash expression"${item%;*}"
. Parameter manipulation expressions were introduced in HPR show 1648 . See the full notes section Remove matching suffix pattern for this one.I modified Ken's
date
command to simplify the generation of the ISO8601 date and time by using the pattern+%FT%T
. This just saves typing!The
url
value is extracted from the contents ofitem
with the expression"${item#*;}
. See the section of show 1648 entitled Remove matching prefix pattern for details.The
echo
which generates the list of podcast URLs prefixed with an ISO time stamp uses';'
as the delimiter where Ken used atab
character. I assume this was done for the benefit of either the followingsort
or theawk
script. It's not needed forsort
since it sorts the line as-is and doesn't use fields. My version doesn't useawk
.Rather than using
awk
I usecut
to remove the time stamp from the front of each line, returning the second field delimited by the semicolon. The result of this will be the URL forwget
to download. In this casewget
receives the URL on standard input ( STDIN ), and the--input-file=-
option tells it to use that information for the download.
Conclusion
I'm not sure my solution is better in any significant way. I prefer to use Bash functionality to do things where calling awk
or sed
could be overkill, but that's just a personal preference.
I might have replaced the head
and cut
with a sed
expression, such as the following as the last line:
sed -e '1{s/^.\+;//;q}' | wget --quiet --input-file=-
Here, the sed
expression operates on the first line from the sort
, where it removes everything from the start of the line to the semicolon. The expression then causes sed
to quit, so that only the edited first line is passed to wget
.
Links
144 episodes
Manage episode 490445528 series 44008
This show has been flagged as Explicit by the host.
Introduction
On 2025-06-19 Ken Fallon did a show, number 4404 , responding to Kevie's show 4398 , which came out on 2025-06-11.
Kevie was using a Bash pipeline to find the latest episode in an RSS feed, and download it. He used grep
to parse the XML of the feed.
Ken's response was to suggest the use of xmlstarlet
to parse the XML because such a complex structured format as XML cannot reliably be parsed without a program that "understands" the intricacies of the format's structure. The same applies to other complex formats such as HTML, YAML and JSON.
In his show Ken presented a Bash script which dealt with this problem and that of the ordering of episodes in the feed. He asked how others would write such a script, and thus I was motivated to produce this response to his response!
Alternative script
My script is a remodelling of Ken's, not a completely different solution. It contains a few alternative ways of doing what Ken did, and a reordering of the parts of his original. We will examine the changes in this episode.
Script
#!/bin/bash # Original (c) CC-0 Ken Fallon 2025 # Modified by Dave Morriss, 2025-06-14 (c) CC-0 podcast="https://tuxjam.otherside.network/feed/podcast/" # [1] while read -r item do # [2] pubDate="${item%;*}" # [3] pubDate="$( \date --date="${pubDate}" --universal +%FT%T )" # [4] url="${item#*;}" # [5] echo "${pubDate};${url}" done < <(curl --silent "${podcast}" | \ xmlstarlet sel --text --template --match 'rss/channel/item' \ --value-of 'concat(pubDate, ";", enclosure/@url)' --nl - ) | \ sort --numeric-sort --reverse | \ head -1 | \ cut -f2 -d';' | wget --quiet --input-file=- # [6]
I have placed some comments in the script in the form of '# [1]'
and I'll refer to these as I describe the changes in the following numbered list.
Note: I checked, and the script will run with the comments, though they are only there to make it easier to refer to things.
The format of the pipeline is different. It starts by defining a
while
loop, but the data which theread
command receives comes from a process substitution of the form'<(statements)'
(see the process substitution section of "hpr2045 :: Some other Bash tips" ). I have arranged the pipeline in this way because it's bad practice to place awhile
in a pipeline, as discussed in the show: hpr3985 :: Bash snippet - be careful when feeding data to loops .
(I added-r
to theread
becauseshellcheck
, which I run in thevim
editor, nagged me!)The lines coming from the process substitution are from running
curl
to collect the feed, then usingxmlstarlet
to pick out thepubDate
field of the item, and theurl
attribute of theenclosure
field returning them as two strings separated by a semicolon (';'
). This is from Ken's original code. Each line is read into the variableitem
, and the first element (before the semicolon) is extracted with the Bash expression"${item%;*}"
. Parameter manipulation expressions were introduced in HPR show 1648 . See the full notes section Remove matching suffix pattern for this one.I modified Ken's
date
command to simplify the generation of the ISO8601 date and time by using the pattern+%FT%T
. This just saves typing!The
url
value is extracted from the contents ofitem
with the expression"${item#*;}
. See the section of show 1648 entitled Remove matching prefix pattern for details.The
echo
which generates the list of podcast URLs prefixed with an ISO time stamp uses';'
as the delimiter where Ken used atab
character. I assume this was done for the benefit of either the followingsort
or theawk
script. It's not needed forsort
since it sorts the line as-is and doesn't use fields. My version doesn't useawk
.Rather than using
awk
I usecut
to remove the time stamp from the front of each line, returning the second field delimited by the semicolon. The result of this will be the URL forwget
to download. In this casewget
receives the URL on standard input ( STDIN ), and the--input-file=-
option tells it to use that information for the download.
Conclusion
I'm not sure my solution is better in any significant way. I prefer to use Bash functionality to do things where calling awk
or sed
could be overkill, but that's just a personal preference.
I might have replaced the head
and cut
with a sed
expression, such as the following as the last line:
sed -e '1{s/^.\+;//;q}' | wget --quiet --input-file=-
Here, the sed
expression operates on the first line from the sort
, where it removes everything from the start of the line to the semicolon. The expression then causes sed
to quit, so that only the edited first line is passed to wget
.
Links
144 episodes
All episodes
×


Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.