I’ve been playing with Clojure recently and loving it. However, most of the stuff I’ve been doing with it recently has been Exercism.io, and not actual application code. To rectify that, I pounded out (slammed my head against a few problems) a simple application (sure, now that it’s written) to get my secret feed from NSScreencast and download the videos.
Two Great Houses
Reading a feed and downloading movie files is very simple on the surface. There are two things complicating the issue, but they are two very important parts. First, feeds are stored as RSS, and XML is best treated as a tree data structure, not as text to Regex over. Second, NSScreencast uses a CDN, but that means the video URLs return a redirect.
The Forest of Xml
The Clojure community has taken a shining to using Zippers for traversing large trees of data. Xml is just a tree of data. Once you parse the damn thing. Fortunately, Clojure has a few libraries for handling this; Clojure.xml, Clojure.zip, and Clojure.data.zip.xml.
Our Imports
[clj-http.xml :as xml]
[clojure.data.zip.xml :as zip-xml]
[clojure.zip :as zip]
Building a zipper
Eventually, I need to have an xml zipper from the Clojure.zip library. For that to happen, I need to give Clojure.xml’s parse an input-stream based on the feed’s url.
(defn make-feed [uri]
(-> uri
input-stream
xml/parse
zip/xml-zip))
The feed I process is build the following:
(def feed (make-feed "https://nsscreencast.com/private_feed/«secret_key»?video_format=mp4"))
Yeah, I’m not giving you my secret key. Go subscribe! They are good!
Traversing a zipper
Here’s an example of the xml we are navigating. This is taken from NSScreencast’s free video feed and only has the first two entities.
<?xml version="1.0" encoding="UTF-8"?>
<feed xml:lang="en-US" xmlns="http://www.w3.org/2005/Atom">
<id>tag:nsscreencast.com,2005:/feed</id>
<link rel="alternate" type="text/html" href="http://nsscreencast.com"/>
<link rel="self" type="application/atom+xml" href="http://nsscreencast.com/feed"/>
<title>NSScreencast (free videos)</title>
<updated>2014-01-29T00:00:13Z</updated>
<entry>
<id>tag:nsscreencast.com,2005:Episode/93</id>
<published>2013-09-26T10:02:32Z</published>
<updated>2013-09-26T10:02:32Z</updated>
<link rel="alternate" type="text/html" href="http://nsscreencast.com/episodes/87-xcode-5-autolayout-improvements"/>
<title>#87 - Xcode 5 Autolayout Improvements</title>
<content type="text">This week we have another free bonus video on the improvements that Xcode 5 brings to Autolayout. As something that has been quite obnoxious to work with in the past, many people dismissed auto layout when it was introduced to iOS 6. With these improvements it is much more friendly and dare I say... usable?</content>
<link rel="enclosure" xml:lang="en-US" title="Xcode 5 Autolayout Improvements" type="video/mp4" hreflang="en-US" href="http://nsscreencast.com/episodes/87-xcode-5-autolayout-improvements.m4v"/>
<author>
<name>Ben Scheirman</name>
</author>
</entry>
<entry>
<id>tag:nsscreencast.com,2005:Episode/91</id>
<published>2013-09-19T10:01:59Z</published>
<updated>2013-09-19T10:01:59Z</updated>
<link rel="alternate" type="text/html" href="http://nsscreencast.com/episodes/85-hello-ios-7"/>
<title>#85 - Hello, iOS 7</title>
<content type="text">To celebrate the launch of iOS 7, here is a bonus free screencast covering a few of the concepts in iOS 7 such as the status bar behavior, tint color, and navigation bar transitions. We'll also take a look at Xcode 5 with a couple of the new features, including the integrated test runner.</content>
<link rel="enclosure" xml:lang="en-US" title="Hello, iOS 7" type="video/mp4" hreflang="en-US" href="http://nsscreencast.com/episodes/85-hello-ios-7.m4v"/>
<author>
<name>Ben Scheirman</name>
</author>
</entry>
</feed>
Once I have the zipper for the feed, I need to traverse the feed’s tree. What I want eventually is a list of file names and urls for that file. the file name will come from the feed, but I need to find the url from traversing all entry
tags, their child link
tags, but only the link
tags with a rel="enclosure"
. But honestly, I only care about the href
attribute.
(zip-xml/xml-> feed
:entry
:link
[(zip-xml/attr= :rel "enclosure")]
(zip-xml/attr :href))
Once we have this, we need to find the file name. Anything between the last /
and the only ?
is the file name. The way clojure returns Regex matches, however, is a match followed by the group. So, if we spread out the regex to match the whole Uri, we have the two peices we need for further processing. Well, we need to map over first
before doing anything because of the way re-seq
returns data.
(defn link-file-pairs []
(->> «feedTraversal»
(map #(re-seq #".*/(.*)\?.*" %))
(map first)))
Looking in the wrong place
While Clojure’s input-stream
is capable of handling a uri, it has some trickyness with SSL urls and more importantly, the redirect to the CDN is handed back as a document I have to parse by hand. Fortunately, clj-http.client handles all this for us. We’ll require the lib as client
. To copy a file from a uri to the disk, we use our copy-file routine.
(defn copy-file [[uri file]]
(with-open [w (output-stream (fullname file))]
(.write w (:body (client/get uri {:as :byte-array})))))
the function client/get
returns a map of data, and we only care about the :body
. We ask for the body to be returned as a byte-array with {:as :byte-array}
. with-open
is used to close the output-stream once we’re done. And since we are using this to map over a vector of uri and file, we will destructure the vector in the argument list.
Fleshing out the rest
The main algorithm of the downloader is to get a list of link-file-pairs, filter the videos I’ve already downloaded, and copy all the files. I tag a doall
to ensure any lazy seqs actually execute.
(defn -main []
(->> (link-file-pairs)
(filter not-downloaded)
(map copy-file)
(doall)))
Determining if a file is downloaded is fairly simple. Figure out the full path of the expected file, make it a file object, and ask if it exists. But I want the inverse of exists, so we’ll tag a not
on the end.
(defn not-downloaded [[uri file]]
(-> file
fullname
as-file
.exists
not))
What’s in a name?
I’ve already downloaded some files, so I wanted to make sure they had the same naming scheme. Also, the fullname
function knows where on my machine the files are supposed to go. The details arent interesting as the prefix
makes life more difficult than it needs to be.
(defn zero-pad [n number-string]
(->> (concat (reverse number-string) (repeat \0))
(take n)
reverse
(apply str)))
(defn prefix [file-name]
(clojure.string/replace file-name #"^\d+" #(zero-pad 3 %)))
(defn fullname [file]
(str "C:\\Users\\bjball\\Videos\\NSScreencasts\\ns" (prefix file)))
The Details
The code is available on Github. As a part of a literate programming exercise, the structure is all laid out below.
(ns nsscreencast-fetcher.core
(:use [clojure.java.io :only [output-stream as-file input-stream]])
(:require [clj-http.client :as client]
«xmlDecls»))
«makeFeed»
«feed»
«linkFilePairs»
«fullName»
«copyFile»
«notDownloaded»
«main»