Browsing the archives for the literate tag.

Are You Doing Anything Real With It?

work safe

I’ve been playing with Clojure recently and loving it. However, most of the stuff I’ve been doing with it recently has been Exercism.io, and not actual application code. To rectify that, I pounded out (slammed my head against a few problems) a simple application (sure, now that it’s written) to get my secret feed from NSScreencast and download the videos.

Two Great Houses

Reading a feed and downloading movie files is very simple on the surface. There are two things complicating the issue, but they are two very important parts. First, feeds are stored as RSS, and XML is best treated as a tree data structure, not as text to Regex over. Second, NSScreencast uses a CDN, but that means the video URLs return a redirect.

The Forest of Xml

The Clojure community has taken a shining to using Zippers for traversing large trees of data. Xml is just a tree of data. Once you parse the damn thing. Fortunately, Clojure has a few libraries for handling this; Clojure.xml, Clojure.zip, and Clojure.data.zip.xml.

Our Imports

            [clj-http.xml :as xml]
            [clojure.data.zip.xml :as zip-xml]
            [clojure.zip :as zip]

Building a zipper

Eventually, I need to have an xml zipper from the Clojure.zip library. For that to happen, I need to give Clojure.xml’s parse an input-stream based on the feed’s url.

(defn make-feed [uri]
  (-> uri
      input-stream
      xml/parse
      zip/xml-zip))

The feed I process is build the following:

(def feed (make-feed "https://nsscreencast.com/private_feed/«secret_key»?video_format=mp4"))

Yeah, I’m not giving you my secret key. Go subscribe! They are good!

Traversing a zipper

Here’s an example of the xml we are navigating. This is taken from NSScreencast’s free video feed and only has the first two entities.

<?xml version="1.0" encoding="UTF-8"?>
<feed xml:lang="en-US" xmlns="http://www.w3.org/2005/Atom">
  <id>tag:nsscreencast.com,2005:/feed</id>
  <link rel="alternate" type="text/html" href="http://nsscreencast.com"/>
  <link rel="self" type="application/atom+xml" href="http://nsscreencast.com/feed"/>
  <title>NSScreencast (free videos)</title>
  <updated>2014-01-29T00:00:13Z</updated>
  <entry>
    <id>tag:nsscreencast.com,2005:Episode/93</id>
    <published>2013-09-26T10:02:32Z</published>
    <updated>2013-09-26T10:02:32Z</updated>
    <link rel="alternate" type="text/html" href="http://nsscreencast.com/episodes/87-xcode-5-autolayout-improvements"/>
    <title>#87 - Xcode 5 Autolayout Improvements</title>
    <content type="text">This week we have another free bonus video on the improvements that Xcode 5 brings to Autolayout.  As something that has been quite obnoxious to work with in the past, many people dismissed auto layout when it was introduced to iOS 6.  With these improvements it is much more friendly and dare I say... usable?</content>
    <link rel="enclosure" xml:lang="en-US" title="Xcode 5 Autolayout Improvements" type="video/mp4" hreflang="en-US" href="http://nsscreencast.com/episodes/87-xcode-5-autolayout-improvements.m4v"/>
    <author>
      <name>Ben Scheirman</name>
    </author>
  </entry>
  <entry>
    <id>tag:nsscreencast.com,2005:Episode/91</id>
    <published>2013-09-19T10:01:59Z</published>
    <updated>2013-09-19T10:01:59Z</updated>
    <link rel="alternate" type="text/html" href="http://nsscreencast.com/episodes/85-hello-ios-7"/>
    <title>#85 - Hello, iOS 7</title>
    <content type="text">To celebrate the launch of iOS 7, here is a bonus free screencast covering a few of the concepts in iOS 7 such as the status bar behavior, tint color, and navigation bar transitions.  We'll also take a look at Xcode 5 with a couple of the new features, including the integrated test runner.</content>
    <link rel="enclosure" xml:lang="en-US" title="Hello, iOS 7" type="video/mp4" hreflang="en-US" href="http://nsscreencast.com/episodes/85-hello-ios-7.m4v"/>
    <author>
      <name>Ben Scheirman</name>
    </author>
  </entry>
</feed>

Once I have the zipper for the feed, I need to traverse the feed’s tree. What I want eventually is a list of file names and urls for that file. the file name will come from the feed, but I need to find the url from traversing all entry tags, their child link tags, but only the link tags with a rel="enclosure". But honestly, I only care about the href attribute.

(zip-xml/xml-> feed
               :entry
               :link
               [(zip-xml/attr= :rel "enclosure")]
               (zip-xml/attr :href))

Once we have this, we need to find the file name. Anything between the last / and the only ? is the file name. The way clojure returns Regex matches, however, is a match followed by the group. So, if we spread out the regex to match the whole Uri, we have the two peices we need for further processing. Well, we need to map over first before doing anything because of the way re-seq returns data.

(defn link-file-pairs []
  (->> «feedTraversal»
       (map #(re-seq #".*/(.*)\?.*" %))
       (map first)))

Looking in the wrong place

While Clojure’s input-stream is capable of handling a uri, it has some trickyness with SSL urls and more importantly, the redirect to the CDN is handed back as a document I have to parse by hand. Fortunately, clj-http.client handles all this for us. We’ll require the lib as client. To copy a file from a uri to the disk, we use our copy-file routine.

(defn copy-file [[uri file]]
  (with-open [w (output-stream (fullname file))]
             (.write w (:body (client/get uri {:as :byte-array})))))

the function client/get returns a map of data, and we only care about the :body. We ask for the body to be returned as a byte-array with {:as :byte-array}. with-open is used to close the output-stream once we’re done. And since we are using this to map over a vector of uri and file, we will destructure the vector in the argument list.

Fleshing out the rest

The main algorithm of the downloader is to get a list of link-file-pairs, filter the videos I’ve already downloaded, and copy all the files. I tag a doall to ensure any lazy seqs actually execute.

(defn -main []
   (->> (link-file-pairs)
        (filter not-downloaded)
        (map copy-file)
        (doall)))

Determining if a file is downloaded is fairly simple. Figure out the full path of the expected file, make it a file object, and ask if it exists. But I want the inverse of exists, so we’ll tag a not on the end.

(defn not-downloaded [[uri file]]
  (-> file
      fullname
      as-file
      .exists
      not))

What’s in a name?

I’ve already downloaded some files, so I wanted to make sure they had the same naming scheme. Also, the fullname function knows where on my machine the files are supposed to go. The details arent interesting as the prefix makes life more difficult than it needs to be.

(defn zero-pad [n number-string]
  (->> (concat (reverse number-string) (repeat \0))
       (take n)
       reverse
       (apply str)))

(defn prefix [file-name]
  (clojure.string/replace file-name #"^\d+" #(zero-pad 3 %)))

(defn fullname [file]
  (str "C:\\Users\\bjball\\Videos\\NSScreencasts\\ns" (prefix file)))

The Details

The code is available on Github. As a part of a literate programming exercise, the structure is all laid out below.

(ns nsscreencast-fetcher.core
  (:use [clojure.java.io :only [output-stream as-file input-stream]])
        (:require [clj-http.client :as client]
                  «xmlDecls»))
«makeFeed»
«feed»
«linkFilePairs»
«fullName»
«copyFile»
«notDownloaded»
«main»
No Comments