Google has been known to terminate entire Youtube channels without notice in the hope of staying advertiser-friendly. 58 million “problematic” videos were deleted from the platform in Q3 2018 1. This week we are exploring various Youtube archival solutions that utilizes youtube-dl, a command-line program that extracts and download videos from web pages.

Channels removed, by removal reason

Installation

Install youtube-dl via pip to ensure that you have the latest version. Google often pushes updates that breaks youtube-dl so you always want to stay up to date. You might be able to install it via your distibution’s package manager but it’s often several versions behind.

pip install --upgrade youtube-dl
youtube-dl --version

Basic usage

You can now use the youtube-dl command to download a single video, a playlist or a channel:

youtube-dl https://www.youtube.com/watch?v=XXXXXXXXXX
youtube-dl https://www.youtube.com/channel/XXXXXXXXXX

You can use the same command to download videos from various websites.Youtube-dl is also a Python library that you can use in scripts:

1
2
3
4
5
6
#!/usr/bin/python

import youtube_dl

yt_dl = youtube_dl.YoutubeDL()
yt_dl.download("https://www.youtube.com/watch?v=XXXXXXXXXXXX")

Command line arguments & Scripting

This document is not a replacement for youtube-dl’s documentation, you can find the updated list of command-line arguments on its Github page.

Below is a bash script that will download everything specified in list.txt, a text file with a youtube channel or video on each line. The script will save the URLs of the videos in archive.txt as it downloads them to speedup the subsequent executions. The --write-info-json and --write-thumbnail options ensures that we also download the video metadata such as the description and the title.

1
2
3
4
5
6
7
youtube-dl -a list.txt -o '%(uploader)s/%(title)s-%(id)s.%(ext)s' \
	--write-thumbnail\
	-f "bestvideo[ext=webm]+bestaudio[ext=webm]/bestvideo[ext=mp4]+bestaudio[ext=m4a]/bestvideo+bestaudio/best"\
	--write-info-json\
	--geo-bypass\
	--ignore-errors\
	--download-archive archive.txt

You can decide to run the script periodically or to schedule it with cron:

crontab -e
# Download new videos every 2 hours
0 */2 * * *
/mnt/Archive/Youtube/archive_yt.sh

Live streams

While a cron job downloads all videos uploaded by a channel (if the uploader does not delete the video between executions), it does not handle live streams. Youtube-dl allows you to download live streams with the same command but you obviously have to start the execution during the stream. Below is a different approach that takes advantage of the Youtube email notification feature. This simple Python script reads your last 3 emails and searches for a Youtube link in the email body. It will immediately start downloading the video using the youtube-dl Python library. You can use this method to download uploaded videos as well as live streams. If you are using a Gmail account, you will need to generate an App Password to allow the script to login.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#!/usr/bin/python
  
import imaplib
import re
import youtube_dl

# Initialize the youtube-dl downloader, nooverwrites param will
# skip videos that are already downloaded or 
# currently being downloaded
yt_dl = youtube_dl.YoutubeDL(params={
    "nooverwrites": True,
    "nopart": True,
    "format": "bestvideo[ext=webm]+bestaudio[ext=webm]/bestvideo[ext=mp4]+bestaudio[ext=m4a]/bestvideo+bestaudio/best"
    })

# This regex pattern matches Youtube video links
YT_LINK = re.compile("Fv%3D([^%]*)%")

mail = imaplib.IMAP4_SSL("imap.gmail.com")
mail.login("username@gmail.com", "password")
mail.list()
mail.select("INBOX")

# Fetch the last 3 emails
_, data = mail.search(None, "ALL")
last_emails = list(reversed(data[0].split()))[:3]

for num in last_emails:
    _, data = mail.fetch(num, "(RFC822)")
    body = data[0][1].decode()

    # Check for pattern match
    match = YT_LINK.search(body)
    if match:
        url = "https://youtube.com/watch?v=" + match.group(1)
        yt_dl.download([url, ]) # immediately download

Automatically upload to rclone remote

To take advantage of cloud storage, you can setup ytdlrc to automatically move videos to an rclone remote as they are downloaded. This simple script is completely interchangeable with youtube-dl and can be set up on a machine with low disk space. The script uses your existing youtube-dl and rclone configuration and is ideal for setting up automatic Youtube archival on a cheap VPS.

Archiving Metadata

If you wish to save a video’s metadata without downloading the actual video, there are command-line utilities dedicated to this task.


Sources