![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Since I've been doing my "radio show" on YouTube, I've been developing tools in bash script (shell scripting) that allow me to use the YouTube API (v3) to automatically extract information from my playlists and store them in files formatted in a way useful to me. In particular, as I'm putting a show together, one of the key things I need to know is how long it is. In my case, I have specific sorting I need to do to separate between the commentary video I do and the music itself, but I needed a generic script that let me just do it generically for any playlist (with no special sorting). The script uses standard Linux utilities plus "curl" to do the API queries. If it isn't clear, the YouTube API is URL-based. One of the big things I needed to figure out is that YouTube will return a maximum of 50 entries for a query, so it provides a "nextPageToken" that needs to be used to get the next 50 (or less) entries. I start with my nextPageToken as as empty string (YouTube accepts an empty value and returns the first page), and then set the nextPageToken variable to the value returned for the next page's token (gasp!). When the results don't contain a "nextPageToken" keyword, it's the last page and I use that condition to exit the loop.
The script has three major parts: getting the basic playlist info, getting the full contents of the playlist tracks (in particular names and IDs), and then it uses the IDs to get the time information for each of the tracks in the playlist. It builds it all into one file with a header containing the title and summary information, and the a list of the tracks with times. These are stored in the file "./playlist/<playlistID>/playlistInfo.txt (backups are kept of previous runs for each playlist in the same directory). The directories are automatically created. You'll need to get a developer account with YouTube to get a token of your own before you can run this script. If you want to find out how it all works, comment out the file deletions and look at the intermediate results or, even better, run the "curl" commands from the command line and see what comes back (the results are in a JSON format that I parse directly).
The script takes one parameter: the playlist ID. If you go to a YouTube playlist, the playlist ID is the parameter in the URL of the playlist that comes after the "list=" directive and starts with a "PL" (you need to specify the PL as well in the playlist ID). You will, of course, need the rights to read information from a playlist. I'm only running it on mine, so I don't know what the result would be if you ran it on a playlist of mine (I'd be curious).
To invoke it on the playlist I have at the bottom of the post (my show Season 1, Episode 11), I would use:
The output I get is:
Example YouTube playlist (my show, Season 1, Episode 11):
The script has three major parts: getting the basic playlist info, getting the full contents of the playlist tracks (in particular names and IDs), and then it uses the IDs to get the time information for each of the tracks in the playlist. It builds it all into one file with a header containing the title and summary information, and the a list of the tracks with times. These are stored in the file "./playlist/<playlistID>/playlistInfo.txt (backups are kept of previous runs for each playlist in the same directory). The directories are automatically created. You'll need to get a developer account with YouTube to get a token of your own before you can run this script. If you want to find out how it all works, comment out the file deletions and look at the intermediate results or, even better, run the "curl" commands from the command line and see what comes back (the results are in a JSON format that I parse directly).
The script takes one parameter: the playlist ID. If you go to a YouTube playlist, the playlist ID is the parameter in the URL of the playlist that comes after the "list=" directive and starts with a "PL" (you need to specify the PL as well in the playlist ID). You will, of course, need the rights to read information from a playlist. I'm only running it on mine, so I don't know what the result would be if you ran it on a playlist of mine (I'd be curious).
To invoke it on the playlist I have at the bottom of the post (my show Season 1, Episode 11), I would use:
./getArbitraryPlaylist.sh PLcbc6Su4uUe8VxRCRH74ZO8_mgsPOkGQxThe playlist has 15 entries in it and runs for 1h10m07s (all the videos including my parts).
The output I get is:
URL: https://www.youtube.com/playlist?list=PLcbc6Su4uUe8VxRCRH74ZO8_mgsPOkGQx Title: "S01 | EP11 – The Passionate Friar on YouTube (2021/07/11)" Published: 2021-07-03T23:33:42Z Track Count: 15 Total time: 1h10m07s 5:33 – S01 | EP11 | COMMENTARY No. 1 of 4 – The Passionate Friar on YouTube 1:36 – Hell - Clown Core 3:05 – Valentino Khan - Deep Down Low (Official Music Video) 2:37 – IGORRR - VERY NOISE 6:27 – S01 | EP11 | COMMENTARY No. 2 of 4 – The Passionate Friar on YouTube 4:08 – Khruangbin - Evan Finds The Third Room (Official Video) 6:16 – Kamasi Washington - Street Fighter Mas 3:59 – Chelou - Damned Eye See (Official Video) 4:28 – Mcbaise - Water Slide (feat. Kamggarn) 6:14 – S01 | EP11 | COMMENTARY No. 3 of 4 – The Passionate Friar on YouTube 3:15 – Siouxsie And The Banshees - Peek-A-Boo 4:26 – Depeche Mode - Never Let Me Down Again (Official Video) (Heard on Episode 1 of The Last Of Us) 3:36 – FKA twigs - How's That 3:39 – S01 | EP11 | COMMENTARY No. 4 of 4 – The Passionate Friar on YouTube 10:48 – Animal Collective - Bridge To Quiet (Official Video)And here's the script itself (if you have any questions about it, I'll try to answer if you ask):
#!/bin/bash if [[ $# == 1 ]]; then ytPlaylistID=$1 else echo "Usage: getArbitraryPlaylist.sh" exit 1 fi curTimeStamp=`date +%Y%m%d%H%M%S` playlistDirPath="./playlists/$ytPlaylistID" if [ ! -d $playlistDirPath ]; then mkdir -p $playlistDirPath fi targetFile=$playlistDirPath/playlistInfo_"$curTimeStamp".txt ytKey="*** YOU NEED TO PUT YOUR OWN YOUTUBE AUTHENTICATION KEY HERE ***" ytMaxResults="50" echo "==> Getting playlist title and publishing time..." tempFile=$playlistDirPath/"$curTimeStamp"_headerInfo.tmp curl "https://www.googleapis.com/youtube/v3/playlists?part=snippet&id=$ytPlaylistID&key=$ytKey" --header 'Accept: application/json' --compressed > $tempFile echo "URL: https://www.youtube.com/playlist?list=$ytPlaylistID" > $targetFile printf "Title: \"%s\"\n" "`cat $tempFile | grep "\"title\"" | head -1 | sed 's/.*: "\(.*\)",/\1/'`" >> $targetFile printf "Published: %s\n" "`cat $tempFile | grep "\"publishedAt\"" | head -1 | sed 's/.*: "\(.*\)",/\1/'`" >> $targetFile rm $tempFile tempFile=$playlistDirPath/"$curTimeStamp"_rawInfo.tmp detailsTemp=$playlistDirPath/"$curTimeStamp"_playlistDetails.tmp rm -f $detailsTemp nextPageToken="" while : do echo "==> Loading a page of playlist details..." curl 'https://www.googleapis.com/youtube/v3/playlistItems?part=snippet&maxResults='$ytMaxResults'&playlistId='$ytPlaylistID'&key='$ytKey'&pageToken='$nextPageToken --header 'Accept: application/json' --compressed > $tempFile videoCount=`tail -10 $tempFile | grep "\"totalResults\"" | sed 's/.*: \(.*\),/\1/'` nextPageToken=`head -10 $tempFile | grep "\"nextPageToken\"" | sed 's/.*: "\(.*\)",/\1/'` cat $tempFile | egrep "\"title\"|\"videoId\"" >> $detailsTemp if [[ -z $nextPageToken ]]; then break fi done echo "Track Count: $videoCount" >> $targetFile rm $tempFile namesTemp=$playlistDirPath/"$curTimeStamp"_playlistNames.tmp idsTemp=$playlistDirPath/"$curTimeStamp"_playlistIds.tmp timesTemp=$playlistDirPath/"$curTimeStamp"_playlistTimes.tmp grep "\"title\"" $detailsTemp | sed 's/.*: "\(.*\)",/\xe2\x80\x93 \1/' > $namesTemp grep "\"videoId\"" $detailsTemp | sed 's/.*: "\(.*\)"/\1/' > $idsTemp rm $detailsTemp rm -f $timesTemp for i in `cat $idsTemp`; do echo "==> Getting song details... Index = $i" curl 'https://www.googleapis.com/youtube/v3/videos?id='$i'&part=contentDetails&key='$ytKey --header 'Accept: application/json' --compressed | grep "duration" | sed 's/.*: "\(.*\)",/\1/' | sed 's/PT\([0-9]*\)S/PT0M\1S/' | sed 's/PT\([0-9]*\)M\([0-9]*\)S/\1:0\2/' | sed 's/\([0-9]*\):.*\([0-9][0-9]\)$/\1:\2/' | sed 's/^PT\([0-9]*\)M/\1:00/' >> $timesTemp done rm $idsTemp declare -i timeMin declare -i timeSec let minSum=0 let secSum=0 while read -r timeStr; do timeMin=`echo $timeStr | cut -d':' -f1 | sed 's/0\([0-9]\)/\1/'` timeSec=`echo $timeStr | cut -d':' -f2 | sed 's/0\([0-9]\)/\1/'` let minSum=minSum+timeMin let secSum=secSum+timeSec done < $timesTemp let minSum=minSum+secSum/60 let hourCnt=minSum/60 let minRem=minSum%60 let secRem=secSum%60 printf "Total time: %dh%02dm%02ds\n\n" $hourCnt $minRem $secRem >> $targetFile paste -d' ' $timesTemp $namesTemp >> $targetFile rm $timesTemp $namesTemp cp $targetFile $playlistDirPath/playlistInfo.txt exit 0
Example YouTube playlist (my show, Season 1, Episode 11):