pheloniusfriar: (Default)
[personal profile] pheloniusfriar
Since I've been doing my "radio show" on YouTube, I've been developing tools in bash script (shell scripting) that allow me to use the YouTube API (v3) to automatically extract information from my playlists and store them in files formatted in a way useful to me. In particular, as I'm putting a show together, one of the key things I need to know is how long it is. In my case, I have specific sorting I need to do to separate between the commentary video I do and the music itself, but I needed a generic script that let me just do it generically for any playlist (with no special sorting). The script uses standard Linux utilities plus "curl" to do the API queries. If it isn't clear, the YouTube API is URL-based. One of the big things I needed to figure out is that YouTube will return a maximum of 50 entries for a query, so it provides a "nextPageToken" that needs to be used to get the next 50 (or less) entries. I start with my nextPageToken as as empty string (YouTube accepts an empty value and returns the first page), and then set the nextPageToken variable to the value returned for the next page's token (gasp!). When the results don't contain a "nextPageToken" keyword, it's the last page and I use that condition to exit the loop.

The script has three major parts: getting the basic playlist info, getting the full contents of the playlist tracks (in particular names and IDs), and then it uses the IDs to get the time information for each of the tracks in the playlist. It builds it all into one file with a header containing the title and summary information, and the a list of the tracks with times. These are stored in the file "./playlist/<playlistID>/playlistInfo.txt (backups are kept of previous runs for each playlist in the same directory). The directories are automatically created. You'll need to get a developer account with YouTube to get a token of your own before you can run this script. If you want to find out how it all works, comment out the file deletions and look at the intermediate results or, even better, run the "curl" commands from the command line and see what comes back (the results are in a JSON format that I parse directly).

The script takes one parameter: the playlist ID. If you go to a YouTube playlist, the playlist ID is the parameter in the URL of the playlist that comes after the "list=" directive and starts with a "PL" (you need to specify the PL as well in the playlist ID). You will, of course, need the rights to read information from a playlist. I'm only running it on mine, so I don't know what the result would be if you ran it on a playlist of mine (I'd be curious).

To invoke it on the playlist I have at the bottom of the post (my show Season 1, Episode 11), I would use:
./getArbitraryPlaylist.sh PLcbc6Su4uUe8VxRCRH74ZO8_mgsPOkGQx
The playlist has 15 entries in it and runs for 1h10m07s (all the videos including my parts).

The output I get is:
URL: https://www.youtube.com/playlist?list=PLcbc6Su4uUe8VxRCRH74ZO8_mgsPOkGQx
Title: "S01 | EP11 – The Passionate Friar on YouTube (2021/07/11)"
Published: 2021-07-03T23:33:42Z
Track Count: 15
Total time: 1h10m07s

5:33 – S01 | EP11 | COMMENTARY No. 1 of 4 – The Passionate Friar on YouTube
1:36 – Hell - Clown Core
3:05 – Valentino Khan - Deep Down Low (Official Music Video)
2:37 – IGORRR - VERY NOISE
6:27 – S01 | EP11 | COMMENTARY No. 2 of 4 – The Passionate Friar on YouTube
4:08 – Khruangbin - Evan Finds The Third Room (Official Video)
6:16 – Kamasi Washington - Street Fighter Mas
3:59 – Chelou - Damned Eye See (Official Video)
4:28 – Mcbaise - Water Slide (feat. Kamggarn)
6:14 – S01 | EP11 | COMMENTARY No. 3 of 4 – The Passionate Friar on YouTube
3:15 – Siouxsie And The Banshees - Peek-A-Boo
4:26 – Depeche Mode - Never Let Me Down Again (Official Video) (Heard on Episode 1 of The Last Of Us)
3:36 – FKA twigs - How's That
3:39 – S01 | EP11 | COMMENTARY No. 4 of 4 – The Passionate Friar on YouTube
10:48 – Animal Collective - Bridge To Quiet (Official Video)
And here's the script itself (if you have any questions about it, I'll try to answer if you ask):

#!/bin/bash

if [[ $# == 1 ]]; then
    ytPlaylistID=$1
else
    echo "Usage: getArbitraryPlaylist.sh "
    exit 1
fi

curTimeStamp=`date +%Y%m%d%H%M%S`
playlistDirPath="./playlists/$ytPlaylistID"

if [ ! -d $playlistDirPath ]; then
    mkdir -p $playlistDirPath
fi

targetFile=$playlistDirPath/playlistInfo_"$curTimeStamp".txt

ytKey="*** YOU NEED TO PUT YOUR OWN YOUTUBE AUTHENTICATION KEY HERE ***"
ytMaxResults="50"

echo "==> Getting playlist title and publishing time..."

tempFile=$playlistDirPath/"$curTimeStamp"_headerInfo.tmp
curl "https://www.googleapis.com/youtube/v3/playlists?part=snippet&id=$ytPlaylistID&key=$ytKey" --header 'Accept: application/json' --compressed > $tempFile

echo "URL: https://www.youtube.com/playlist?list=$ytPlaylistID" > $targetFile
printf "Title: \"%s\"\n" "`cat $tempFile | grep "\"title\"" | head -1 | sed 's/.*: "\(.*\)",/\1/'`" >> $targetFile
printf "Published: %s\n" "`cat $tempFile | grep "\"publishedAt\"" | head -1 | sed 's/.*: "\(.*\)",/\1/'`" >> $targetFile

rm $tempFile

tempFile=$playlistDirPath/"$curTimeStamp"_rawInfo.tmp
detailsTemp=$playlistDirPath/"$curTimeStamp"_playlistDetails.tmp

rm -f $detailsTemp

nextPageToken=""
while :
do
    echo "==> Loading a page of playlist details..."

    curl 'https://www.googleapis.com/youtube/v3/playlistItems?part=snippet&maxResults='$ytMaxResults'&playlistId='$ytPlaylistID'&key='$ytKey'&pageToken='$nextPageToken --header 'Accept: application/json' --compressed > $tempFile

    videoCount=`tail -10 $tempFile | grep "\"totalResults\"" | sed 's/.*: \(.*\),/\1/'`
    nextPageToken=`head -10 $tempFile | grep "\"nextPageToken\"" | sed 's/.*: "\(.*\)",/\1/'`
    cat $tempFile | egrep "\"title\"|\"videoId\"" >> $detailsTemp

    if [[ -z $nextPageToken ]];
    then
	break
    fi
done

echo "Track Count: $videoCount" >> $targetFile
rm $tempFile

namesTemp=$playlistDirPath/"$curTimeStamp"_playlistNames.tmp
idsTemp=$playlistDirPath/"$curTimeStamp"_playlistIds.tmp
timesTemp=$playlistDirPath/"$curTimeStamp"_playlistTimes.tmp

grep "\"title\"" $detailsTemp | sed 's/.*: "\(.*\)",/\xe2\x80\x93 \1/' > $namesTemp
grep "\"videoId\"" $detailsTemp | sed 's/.*: "\(.*\)"/\1/' > $idsTemp

rm $detailsTemp

rm -f $timesTemp

for i in `cat $idsTemp`; do
    echo "==> Getting song details... Index = $i"
    curl 'https://www.googleapis.com/youtube/v3/videos?id='$i'&part=contentDetails&key='$ytKey --header 'Accept: application/json' --compressed | grep "duration" | sed 's/.*: "\(.*\)",/\1/' | sed 's/PT\([0-9]*\)S/PT0M\1S/' | sed 's/PT\([0-9]*\)M\([0-9]*\)S/\1:0\2/' | sed 's/\([0-9]*\):.*\([0-9][0-9]\)$/\1:\2/' | sed 's/^PT\([0-9]*\)M/\1:00/' >> $timesTemp
done

rm $idsTemp

declare -i timeMin
declare -i timeSec

let minSum=0
let secSum=0

while read -r timeStr; do
    timeMin=`echo $timeStr | cut -d':' -f1 | sed 's/0\([0-9]\)/\1/'`
    timeSec=`echo $timeStr | cut -d':' -f2 | sed 's/0\([0-9]\)/\1/'`
    let minSum=minSum+timeMin
    let secSum=secSum+timeSec
done < $timesTemp

let minSum=minSum+secSum/60
let hourCnt=minSum/60
let minRem=minSum%60
let secRem=secSum%60

printf "Total time: %dh%02dm%02ds\n\n" $hourCnt $minRem $secRem >> $targetFile

paste -d' ' $timesTemp $namesTemp >> $targetFile

rm $timesTemp $namesTemp

cp $targetFile $playlistDirPath/playlistInfo.txt

exit 0


Example YouTube playlist (my show, Season 1, Episode 11):

Profile

pheloniusfriar: (Default)
pheloniusfriar

May 2025

S M T W T F S
    123
45678 910
11121314151617
1819202122 2324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated May. 24th, 2025 12:10 pm
Powered by Dreamwidth Studios