Querying YouTube using Google API
Aug. 30th, 2021 02:22 amI am trying to automate, or at least streamline, the various production aspects of my weekly show.
e.g.
One of the silly things I had to do every week that would take at least 15 or 20 minutes was to extract the names of the various tracks from the playlist by hand after I chose and ordered the music to play that week (or just retype it if I couldn't get my cursor into the 2 pixel wide target to actually do a copy from the YouTube UI, ugh). I also did two versions of the song list: a bulleted version to put in the playlist and video descriptions, and a version with the runtimes for my script. Again, all the reformatting and stuff was a chore. After a fairly major learning curve, I was able to figure out the API, then automate the API query using curl, and then script as much of the stuff as I could to save me typing and frustration. I do open up the titles I extract from the various videos in the playlist automatically in emacs as part of the script because there is no standard formatting at all for them (it's a freeform text field and no two are alike it seems) – I do whatever editing I want to the song titles and add additional information I like to include (like if it's a cover or live, and where and when if so). When I save the file and close emacs the remainder of the script puts it all in the exact final format I want. Still some manual intervention, but much less fiddly work than I used to have to do.
Step 1: You need a Google account
Step 2: Create a project in the Google Developers Console
Step 3: Obtain an API key (it's free... supposedly allows up to 10,000 accesses per day without having to pay)
Step 4: You then need to enable the YouTube v3 API for you "application" (project)... this was a bit wonky for me. I think I just tried it without enabling the API and it gave me an error message with a link that let me do it easily
Step 5: Fart around with the URLs to do different things and look at the docs
It's all here: https://developers.google.com/youtube/v3/getting-started
Step 6: Modify the script below if you just want to query playlists on YouTube by putting in your API key in place of the [API KEY] text below. If you want to do something different, hopefully this gives you some ideas on how to best tackle your particular needs.
The script takes one parameter, the ID string of the playlist (e.g. for the playlist above, it's "PLcbc6Su4uUe8VDB5P6x1TY7AR2XK2RIkj"... you can get it by clicking on the title of the playlist at the upper right and then copying it from the URL after the "list=" prefix). It leaves two files: the playlist with the runtimes included at the start of the line, and the playlist just bulleted. The sed hexadecimal nonsense is because I wanted to use UTF-8 characters and the Linux utilities barf on them (in particular, the en dash [E2 80 93] and the bullet character [E2 80 A2]). The Google API queries return JSON data, but I just directly snarf what I need and remove the JSON tags and formatting... I am using very specific data, so it's easy to get it directly. The video title information and the running time information are in two separate databases, so I have to get the videoID of each of the videos in the playlist and query each video directly in a loop to get their runtimes. Lastly, the NO_AT_BRIDGE is to stop emacs (well, GTk) from bitching it can't find a particular resource (it's pointless and just bugs me).
Note: I'm thinking you may need to re-join the lines I split with "\" in the listing for clarity (especially the URLs ... the rest should be fine since it's just command-line stuff).
Edit 2021/10/18: I have made quite a number of changes to the script and it seems to do most of what I want it to do now. Here are the changes from the description above... it now generates four files: playlistBulleted.txt (track names with bullets), playlistMusicTime.txt (total running time of music that aren't my segments... Bash math is always weird to do), playlistURL.txt (saves the URL of the playlist), and playlistWithTimes.txt (track names with running times). It normally saves the files in a directory with the name "Show####-yyyymmdd", which the script gets from the playlist itself (I use the format like "Show #23 – The Passionate Friar on YouTube – 2021/10/03" and it pulls out the show number, zero pads it to the left, and gets the date and strips the forward slashes). If the directory does not exist, it is created, and a template script is copied into it along with the files generated by the script. If the directory exists, the files are just written into that directory (the template file is not overwritten so it doesn't trash my script if I've been working on it). If the "-t" flag is specified, it saves the files to "/tmp" rather than overwriting the files in the show's directory (which I have usually edited and don't want trashed... I just added that today, oh well). I also fixed a couple of bugs where the YouTube running time format "PT<minutes>M<seconds>S" could be "PT2M" if there were 2 minutes and 0 seconds, or "PT23S" if there were 0 minutes and 23 seconds. I saw both cases, but it is fixed now.
In case you hadn't guessed, this is mostly documentation for me when I try to remember what I did, but I do hope that someone with a similar issue finds it and saves some time with it.
If nothing else, if you don't care about my technical ramblings, I have provided an hour of music and commentary to make up for it (note: shameless plug ... and I legit do hear from folks that it's pretty good).
e.g.
One of the silly things I had to do every week that would take at least 15 or 20 minutes was to extract the names of the various tracks from the playlist by hand after I chose and ordered the music to play that week (or just retype it if I couldn't get my cursor into the 2 pixel wide target to actually do a copy from the YouTube UI, ugh). I also did two versions of the song list: a bulleted version to put in the playlist and video descriptions, and a version with the runtimes for my script. Again, all the reformatting and stuff was a chore. After a fairly major learning curve, I was able to figure out the API, then automate the API query using curl, and then script as much of the stuff as I could to save me typing and frustration. I do open up the titles I extract from the various videos in the playlist automatically in emacs as part of the script because there is no standard formatting at all for them (it's a freeform text field and no two are alike it seems) – I do whatever editing I want to the song titles and add additional information I like to include (like if it's a cover or live, and where and when if so). When I save the file and close emacs the remainder of the script puts it all in the exact final format I want. Still some manual intervention, but much less fiddly work than I used to have to do.
Step 1: You need a Google account
Step 2: Create a project in the Google Developers Console
Step 3: Obtain an API key (it's free... supposedly allows up to 10,000 accesses per day without having to pay)
Step 4: You then need to enable the YouTube v3 API for you "application" (project)... this was a bit wonky for me. I think I just tried it without enabling the API and it gave me an error message with a link that let me do it easily
Step 5: Fart around with the URLs to do different things and look at the docs
It's all here: https://developers.google.com/youtube/v3/getting-started
Step 6: Modify the script below if you just want to query playlists on YouTube by putting in your API key in place of the [API KEY] text below. If you want to do something different, hopefully this gives you some ideas on how to best tackle your particular needs.
The script takes one parameter, the ID string of the playlist (e.g. for the playlist above, it's "PLcbc6Su4uUe8VDB5P6x1TY7AR2XK2RIkj"... you can get it by clicking on the title of the playlist at the upper right and then copying it from the URL after the "list=" prefix). It leaves two files: the playlist with the runtimes included at the start of the line, and the playlist just bulleted. The sed hexadecimal nonsense is because I wanted to use UTF-8 characters and the Linux utilities barf on them (in particular, the en dash [E2 80 93] and the bullet character [E2 80 A2]). The Google API queries return JSON data, but I just directly snarf what I need and remove the JSON tags and formatting... I am using very specific data, so it's easy to get it directly. The video title information and the running time information are in two separate databases, so I have to get the videoID of each of the videos in the playlist and query each video directly in a loop to get their runtimes. Lastly, the NO_AT_BRIDGE is to stop emacs (well, GTk) from bitching it can't find a particular resource (it's pointless and just bugs me).
Note: I'm thinking you may need to re-join the lines I split with "\" in the listing for clarity (especially the URLs ... the rest should be fine since it's just command-line stuff).
Edit 2021/10/18: I have made quite a number of changes to the script and it seems to do most of what I want it to do now. Here are the changes from the description above... it now generates four files: playlistBulleted.txt (track names with bullets), playlistMusicTime.txt (total running time of music that aren't my segments... Bash math is always weird to do), playlistURL.txt (saves the URL of the playlist), and playlistWithTimes.txt (track names with running times). It normally saves the files in a directory with the name "Show####-yyyymmdd", which the script gets from the playlist itself (I use the format like "Show #23 – The Passionate Friar on YouTube – 2021/10/03" and it pulls out the show number, zero pads it to the left, and gets the date and strips the forward slashes). If the directory does not exist, it is created, and a template script is copied into it along with the files generated by the script. If the directory exists, the files are just written into that directory (the template file is not overwritten so it doesn't trash my script if I've been working on it). If the "-t" flag is specified, it saves the files to "/tmp" rather than overwriting the files in the show's directory (which I have usually edited and don't want trashed... I just added that today, oh well). I also fixed a couple of bugs where the YouTube running time format "PT<minutes>M<seconds>S" could be "PT2M" if there were 2 minutes and 0 seconds, or "PT23S" if there were 0 minutes and 23 seconds. I saw both cases, but it is fixed now.
#!/bin/bash saveInTmp=0 # -t causes it to save the files in /tmp and not copy the template script if [[ $# == 1 ]]; then youtubeID=$1 else if [[ $# == 2 && $1 == "-t" ]]; then saveInTmp=1 youtubeID=$2 else echo "Usage: extractPlaylist.sh [-t]" exit 1 fi fi # Relies on title being in a format like "Show #19 - The Passionate Friar on YouTube - 2021/09/05" playlistTitle=`curl 'https://www.googleapis.com/youtube/v3/playlists?\ part=snippet&maxResults=25&id='$youtubeID'&key=[API KEY]'\ --header 'Accept: application/json' --compressed | \ grep "\"title\"" | head -1 | sed 's/.*: "\(.*\)",/\1/'` playlistNumber=`echo $playlistTitle | sed 's/.*#\([0-9]*\).*/\1/'` playlistDate=`echo $playlistTitle | sed 's/.*#[0-9]*[^0-9]*\(.*\)/\1/' | sed 's/\///g'` if [[ $saveInTmp == 0 ]]; then playlistShowName=`printf "Show%04d-%s" $playlistNumber $playlistDate` else playlistShowName="/tmp" fi if [ ! -d $playlistShowName ]; then mkdir $playlistShowName cp 00-Script_Template.odt $playlistShowName/`printf "00-Script%04d-%s.odt" $playlistNumber $playlistDate` fi printf "https://www.youtube.com/playlist?list=%s\n" $youtubeID > $playlistShowName/playlistURL.txt curl 'https://www.googleapis.com/youtube/v3/playlistItems?\ part=snippet&maxResults=25&playlistId='$youtubeID'&key=[API KEY]'\ --header 'Accept: application/json' --compressed | \ egrep "\"title\"|\"videoId\"" > playlistInfo.txt grep "title" playlistInfo.txt | sed 's/.*: "\(.*\)",/\xe2\x80\x93 \1/' > playlistNames.txt grep "videoId" playlistInfo.txt | sed 's/.*: "\(.*\)"/\1/' > playlistIds.txt rm playlistTimes.txt > /dev/null 2>&1 for i in `cat playlistIds.txt`; do curl 'https://www.googleapis.com/youtube/v3/videos?\ id='$i'&part=contentDetails&key=[API KEY]' \ --header 'Accept: application/json' --compressed | \ grep "duration" | sed 's/.*: "\(.*\)",/\1/' | sed 's/PT\([0-9]*\)S/PT0M\1S/' | \ sed 's/PT\([0-9]*\)M\([0-9]*\)S/\1:0\2/' | sed 's/\([0-9]*\):.*\([0-9][0-9]\)$/\1:\2/' | \ sed 's/^PT\([0-9]*\)M/\1:00/' >> playlistTimes.txt done rm playlistIds.txt paste -d' ' playlistTimes.txt playlistNames.txt | grep -v ".*Show.*PF #" > playlistInfo.txt cut -d' ' -f1 playlistInfo.txt > playlistTimes.txt cut -d' ' -f2- playlistInfo.txt > playlistNames.txt rm playlistInfo.txt export NO_AT_BRIDGE=1 emacs playlistNames.txt paste -d' ' playlistTimes.txt playlistNames.txt > $playlistShowName/playlistWithTimes.txt let minSum=0 let secSum=0 declare -i timeMin declare -i timeSec for timeStr in `cat playlistTimes.txt`; do timeMin=`echo $timeStr | cut -d':' -f1 | sed 's/0\([0-9]\)/\1/'` timeSec=`echo $timeStr | cut -d':' -f2 | sed 's/0\([0-9]\)/\1/'` let minSum=minSum+timeMin let secSum=secSum+timeSec done let minSum=minSum+secSum/60 let secRem=secSum%60 printf "%dm%02ds\n" $minSum $secRem > $playlistShowName/playlistMusicTime.txt rm playlistTimes.txt cat playlistNames.txt | sed 's/\xe2\x80\x93/\xe2\x80\xa2/' > $playlistShowName/playlistBulleted.txt rm playlistNames.txt* echo "Titles with times for Show #$playlistNumber:" cat $playlistShowName/playlistWithTimes.txt echo echo "Bulleted titles for Show #$playlistNumber:" cat $playlistShowName/playlistBulleted.txt echo echo "Playlist URL for Show #$playlistNumber:" cat $playlistShowName/playlistURL.txt echo echo "Music running time for Show #$playlistNumber:" cat $playlistShowName/playlistMusicTime.txt exit 0
In case you hadn't guessed, this is mostly documentation for me when I try to remember what I did, but I do hope that someone with a similar issue finds it and saves some time with it.
If nothing else, if you don't care about my technical ramblings, I have provided an hour of music and commentary to make up for it (note: shameless plug ... and I legit do hear from folks that it's pretty good).