A recent comment on BoingBoing asked if there was a way to download a video from youtube, such that it could be reposted elsewhere. One solution, suggested by Cory Doctorow, is to use pwnyoutube.com, but there exists a general method that works on all flash video (not just youtube), and happens to be faster than using pwnyoutube.com. Behold! For I shall demonstrate a painless use of lsof, the under-appreciated and extra-useful command line tool.
In the Internet video world, there are two kinds of creatures: streaming video (which is appropriate for live events) and buffered video (which is for recorded things, like youtube). "Buffering" means it's actually downloading a file in the background, and if it can download a little faster than you can watch it, then everything plays smoothly. If the video pauses suddenly and restarts after a few seconds, that's because it's rebuffering.

Have you ever noticed how the youtube progress bar slowly fills in with a pinkish color? It's at about 20% in the picture, above. That indicates how much of the file has been buffered, and when it reaches the end, it means the file is fully downloaded. In other words, you don't need some special plugin or service to "download a video from youtube." Your browser does it automatically! Even better, this happens for any website that uses flash and .flv files to deliver buffered video.
The key is to use lsof (which is a mnemonic for "list open files"). I'm demonstrating this on OS X, but the process is basically the same with *nixes and Cygwin. If you don't have lsof installed by default, just use your package manager to install it. (e.g. apt-get install lsof).
So, the magical incantation is:
lsof |grep lash
I grep for "lash" instead of "Flash" since you never know if the F will be capitalized or not, and this is the laziest way to get the desired results. Here is an example of the output:

Notice the files FlashTmp0 and FlashTmp1? That's where the video files are saved, so long as you keep your browser window and video tabs open. There's no need to "download" a video that you just watched. Instead, simply copy the file straight to your Desktop:
cp /private/var/folders/.../TemporaryItems/FlashTmp1 ~/Desktop/rickroll.flv
Now, you can open the local file with VLC:

You might need to try multiple FlashTmp files before you find the one containing the video you want (i.e. is it FlashTmp0 or FlashTmp1) but there usually aren't many. On many non-youtube sites, this is the only way you're going to get access to a buffered flash video (since there aren't handy pwnyoutube.com clones for everything).
Once you have copied the file to your desktop, why not convert it to mp4 and edit it in iMovie?
ffmpeg -i rickroll.flv rickroll.mp4

And now you know how to access videos you just watched, as well as convert them into a format you can edit.
I'm working on a REST interface at the moment, and there's nothing I need more than a quick utility to test out various functions. Curl fills this role perfectly, but I have run into a strange problem that interferes with multipart/form-data form POSTing. Let me explain some of the evidence I've collected, as well as tell you a workaround I learned from an IRC conversation. In the end, this comes down to the HTTP 1.1 100 CONTINUE response code, which plays a critical role in HTTP 1.1 POST.
For starters, I'm testing this out with OS X 10.5:
$ uname -a
Darwin osx.example.com 9.7.0 Darwin Kernel Version 9.7.0: Tue Mar 31 22:52:17
PDT 2009; root:xnu-1228.12.14~1/RELEASE_I386 i386 i386
For all of the output in this post, I used this version of curl:
$ curl --version
curl 7.19.6 (i386-apple-darwin9.7.0) libcurl/7.19.6 zlib/1.2.3
Protocols: tftp ftp telnet dict http file
Features: Largefile libz
Next up, we have the web server I was testing with:
$ curl -I osx.example.com:8000
HTTP/1.0 302 FOUND
Date: Fri, 18 Sep 2009 13:57:22 GMT
Server: WSGIServer/0.1 Python/2.6.2
I've been using wireshark and tcpdump to watch the traffic. Here's an example invocation of tcpdump that you can work with to replicate the issue:
sudo tcpdump -X -s 1500 -i lo0 tcp port 8000
Obviously, you might run a testing webserver on port 80, and you might send your traffic over lo, eth0, or en0. If you're reading this post, you probably know what's what, and how to modify the command accordingly.
It starts when I try to issue a multipart form POST:
curl -F name=somevalue http://osx.example.com:8000
This means "set the field called 'name' to 'somevalue' and instead of url encoding it, post it as a multipart MIME message." Your browser does this any time you upload a file to a website. In my case, my REST API lets me upload PDF files, so curl needs to use multipart instead of url encoding for this purpose.
Curl only generates one TCP packet based on this command (even though it should generate multiple) and this is what that one packet looks like:
10:07:00.971856 IP osx.57777 > osx.8000: P 1:260(259) ack 1 win 65535
<nop,nop,timestamp 1076257225 1076257225>
0x0000: 4500 0137 03b9 4000 4006 0000 7f00 0001 E..7..@.@.......
0x0010: 7f00 0001 e1b1 1f40 3f35 c1d8 0bb4 4ba7 .......@?5....K.
0x0020: 8018 ffff ff2b 0000 0101 080a 4026 61c9 .....+......@&a.
0x0030: 4026 61c9 504f 5354 202f 2048 5454 502f @&a.POST./.HTTP/
0x0040: 312e 310d 0a55 7365 722d 4167 656e 743a 1.1..User-Agent:
0x0050: 2063 7572 6c2f 372e 3139 2e36 2028 6933 .curl/7.19.6.(i3
0x0060: 3836 2d61 7070 6c65 2d64 6172 7769 6e39 86-apple-darwin9
0x0070: 2e37 2e30 2920 6c69 6263 7572 6c2f 372e .7.0).libcurl/7.
0x0080: 3139 2e36 207a 6c69 622f 312e 322e 330d 19.6.zlib/1.2.3.
0x0090: 0a48 6f73 743a 2031 3237 2e30 2e30 2e31 .Host:.127.0.0.1
0x00a0: 3a38 3030 300d 0a41 6363 6570 743a 202a :8000..Accept:.*
0x00b0: 2f2a 0d0a 436f 6e74 656e 742d 4c65 6e67 /*..Content-Leng
0x00c0: 7468 3a20 3134 380d 0a45 7870 6563 743a th:.148..Expect:
0x00d0: 2031 3030 2d63 6f6e 7469 6e75 650d 0a43 .100-continue..C
0x00e0: 6f6e 7465 6e74 2d54 7970 653a 206d 756c ontent-Type:.mul
0x00f0: 7469 7061 7274 2f66 6f72 6d2d 6461 7461 tipart/form-data
0x0100: 3b20 626f 756e 6461 7279 3d2d 2d2d 2d2d ;.boundary=-----
0x0110: 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d ----------------
0x0120: 2d2d 2d2d 2d2d 2d37 3937 3037 6465 6438 -------79707ded8
0x0130: 6339 640d 0a0d 0a c9d....
We can tell a few things from this. First, the packet is 260 bytes long, the HTTP Content-Length header indicates a forthcoming payload of 148 bytes, and curl has set the HTTP Expect header to:
Expect: 100-continue
Notice that it gets all the way up to the MIME boundary, which is fine. The server responds with an ACK of the 260 bytes received, then sends back an HTTP reply:
10:07:00.971880 IP osx.8000 > osx.57777: . ack 260 win 65535 <nop,nop,
timestamp 1076257225 1076257225>
0x0000: 4500 0034 cda2 4000 4006 0000 7f00 0001 E..4..@.@.......
0x0010: 7f00 0001 1f40 e1b1 0bb4 4ba7 3f35 c2db .....@....K.?5..
0x0020: 8010 ffff fe28 0000 0101 080a 4026 61c9 .....(......@&a.
0x0030: 4026 61c9 @&a.
10:07:00.973365 IP osx.8000 > osx.57777: P 1:21(20) ack 260 win 65535
<nop,nop,timestamp 1076257225 1076257225>
0x0000: 4500 0048 1bbf 4000 4006 0000 7f00 0001 E..H..@.@.......
0x0010: 7f00 0001 1f40 e1b1 0bb4 4ba7 3f35 c2db .....@....K.?5..
0x0020: 8018 ffff fe3c 0000 0101 080a 4026 61c9 .....<......@&a.
0x0030: 4026 61c9 4854 5450 2f31 2e30 2033 3032 @&a.HTTP/1.0.302
0x0040: 2046 4f55 4e44 0d0a .FOUND..
Ah! The server didn't respond with HTTP/1.1 100 CONTINUE. Curl will wait until it receives a 100 CONTINUE before it sends its 148 byte payload. If your server never sends that response, curl will never send the payload. This can be a problem if your server or your application doesn't know about this HTTP 1.1 behavior.
Here is the same communication, as seen from curl's perspective with the -v flag (for verbose output):
$ curl -v -F name=somevalue http://127.0.0.1:8000
* About to connect() to 127.0.0.1 port 8000 (#0)
* Trying 127.0.0.1... connected
* Connected to 127.0.0.1 (127.0.0.1) port 8000 (#0)
> POST / HTTP/1.1
> User-Agent: curl/7.19.6 (i386-apple-darwin9.7.0) libcurl/7.19.6 zlib/1.2.3
> Host: 127.0.0.1:8000
> Accept: */*
> Content-Length: 148
> Expect: 100-continue
> Content-Type: multipart/form-data; boundary=----------------------------d70cdce71857
>
* HTTP 1.0, assume close after body
> HTTP/1.0 302 FOUND
> Date: Fri, 18 Sep 2009 15:03:20 GMT
> Server: WSGIServer/0.1 Python/2.6.2
> Vary: Cookie
> Content-Type: text/html; charset=utf-8
> Location: http://127.0.0.1:8000/crm/form/login?next=/
>
* Closing connection #0
So, the fact that the server responds with a 302 FOUND instead of a 100 CONTINUE is really a feature of HTTP 1.1 because it is at this point that the conversation stops. The 302 response will redirect the client to another resource on the server, so why not wait until you actually hit the resource that will receive your POST before you POST your payload? If you're POSTing a file to a URL redirect, then you will end up uploading your file at least twice, and that's a waste of precious upstream bandwidth.
The HTTP/1.1 100 CONTINUE can potentially save you bandwidth, but read the coda at the end of this post for a cautionary tale.
Let's try this again without the Expect header:
curl -H "Expect:" -F name=somevalue http://osx.example.com:8000
The command above is identical to the previous one with the exception of the -H flag. By setting "Expect:" to have no value after the colon, curl will interpret this as deleting the Expect header. Sure enough, when we look at the TCP packet, the Expect header is gone:
10:11:59.308674 IP osx.57803 > osx.8000: P 1:238(237) ack 1 win 65535
>nop,nop,timestamp 1076260192 1076260192>
0x0000: 4500 0121 c4b5 4000 4006 0000 7f00 0001 E..!..@.@.......
0x0010: 7f00 0001 e1cb 1f40 5bb2 5099 2f1c 8bad .......@[.P./...
0x0020: 8018 ffff ff15 0000 0101 080a 4026 6d60 ............@&m`
0x0030: 4026 6d60 504f 5354 202f 2048 5454 502f @&m`POST./.HTTP/
0x0040: 312e 310d 0a55 7365 722d 4167 656e 743a 1.1..User-Agent:
0x0050: 2063 7572 6c2f 372e 3139 2e36 2028 6933 .curl/7.19.6.(i3
0x0060: 3836 2d61 7070 6c65 2d64 6172 7769 6e39 86-apple-darwin9
0x0070: 2e37 2e30 2920 6c69 6263 7572 6c2f 372e .7.0).libcurl/7.
0x0080: 3139 2e36 207a 6c69 622f 312e 322e 330d 19.6.zlib/1.2.3.
0x0090: 0a48 6f73 743a 2031 3237 2e30 2e30 2e31 .Host:.127.0.0.1
0x00a0: 3a38 3030 300d 0a41 6363 6570 743a 202a :8000..Accept:.*
0x00b0: 2f2a 0d0a 436f 6e74 656e 742d 4c65 6e67 /*..Content-Leng
0x00c0: 7468 3a20 3134 380d 0a43 6f6e 7465 6e74 th:.148..Content
0x00d0: 2d54 7970 653a 206d 756c 7469 7061 7274 -Type:.multipart
0x00e0: 2f66 6f72 6d2d 6461 7461 3b20 626f 756e /form-data;.boun
0x00f0: 6461 7279 3d2d 2d2d 2d2d 2d2d 2d2d 2d2d dary=-----------
0x0100: 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d ----------------
0x0110: 2d65 3932 3861 6430 3332 3262 340d 0a0d -e928ad0322b4...
0x0120: 0a
The packet is now 238 bytes long, which is exactly right. "Expect: 100-continue" is 20 bytes long, plus 2 bytes for \r\n (0d0a in hex), which accounts for the packet being 22 bytes shorter than the previous 260 byte packet. As before, the content length is 148 bytes. As before, the packet goes all the way up to the MIME boundary.
As before, the server sends back a TCP ACK of the 238 bytes received, but here's the difference. Critically, without the Expect header, curl sends the entire payload before the server responds:
10:11:59.308693 IP osx.8000 > osx.57803: . ack 238 win 65535 >nop,nop,
timestamp 1076260192 1076260192>
0x0000: 4500 0034 05cf 4000 4006 0000 7f00 0001 E..4..@.@.......
0x0010: 7f00 0001 1f40 e1cb 2f1c 8bad 5bb2 5186 .....@../...[.Q.
0x0020: 8010 ffff fe28 0000 0101 080a 4026 6d60 .....(......@&m`
0x0030: 4026 6d60 @&m`
10:11:59.308751 IP osx.57803 > osx.8000 P 238:386(148) ack 1 win 65535
>nop,nop,timestamp 1076260192 1076260192>
0x0000: 4500 00c8 45bc 4000 4006 0000 7f00 0001 E...E.@.@.......
0x0010: 7f00 0001 e1cb 1f40 5bb2 5186 2f1c 8bad .......@[.Q./...
0x0020: 8018 ffff febc 0000 0101 080a 4026 6d60 ............@&m`
0x0030: 4026 6d60 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d @&m`------------
0x0040: 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d ----------------
0x0050: 2d2d 6539 3238 6164 3033 3232 6234 0d0a --e928ad0322b4..
0x0060: 436f 6e74 656e 742d 4469 7370 6f73 6974 Content-Disposit
0x0070: 696f 6e3a 2066 6f72 6d2d 6461 7461 3b20 ion:.form-data;.
0x0080: 6e61 6d65 3d22 6e61 6d65 220d 0a0d 0a73 name=**"name"**....**s**
0x0090: 6f6d 6576 616c 7565 0d0a 2d2d 2d2d 2d2d **omevalue**..------
0x00a0: 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d ----------------
0x00b0: 2d2d 2d2d 2d2d 2d2d 6539 3238 6164 3033 --------e928ad03
0x00c0: 3232 6234 2d2d 0d0a 22b4--..
Whoah! Did you notice the "name" ... somevalue above? That's our payload, and it's finally being transmitted. This all happens before the server respond with an HTTP status code. When sending the response, the TCP header is now an ACK to the total size of the POST-related packets (238 and 148, which is a total size of 386 bytes).
10:11:59.308768 IP osx.8000 > osx.57803: . ack 386 win 65535 >nop,nop,
timestamp 1076260192 1076260192>
0x0000: 4500 0034 d082 4000 4006 0000 7f00 0001 E..4..@.@.......
0x0010: 7f00 0001 1f40 e1cb 2f1c 8bad 5bb2 521a .....@../...[.R.
0x0020: 8010 ffff fe28 0000 0101 080a 4026 6d60 .....(......@&m`
0x0030: 4026 6d60 @&m`
10:11:59.316155 IP osx.8000 > osx.57803: P 1:21(20) ack 386 win 65535
>nop,nop,timestamp 1076260192 1076260192>
0x0000: 4500 0048 e1c5 4000 4006 0000 7f00 0001 E..H..@.@.......
0x0010: 7f00 0001 1f40 e1cb 2f1c 8bad 5bb2 521a .....@../...[.R.
0x0020: 8018 ffff fe3c 0000 0101 080a 4026 6d60 .....>......@&m`
0x0030: 4026 6d60 4854 5450 2f31 2e30 2033 3032 @&m`HTTP/1.0.302
0x0040: 2046 4f55 4e44 0d0a .FOUND..
So removing the Expect header allowed curl to send an HTTP 1.1 POST, with its payload, before the server generated HTTP 1.0 302 FOUND. The fact that the server responded with 302 FOUND means the entire POST data was ignored on the server side, but the client DID send it! In other words, we just wasted some bandwidth, and we are going to need to POST the data at least one more time. When curl was expecting an HTTP 1.1 100 CONTINUE instead, it never sends the rest of the payload, and curl never complains, not even with -v.
I'll spare you the complete packet dump, but suffice to say that when curl is invoked with the HTTP 1.0 flag (-0, as in "dash zero"), it works just like when the Expect header is absent. In other words, the following example also sends the payload before waiting for the server to respond.
curl -0 -F name=somevalue http://osx.example.com:8000
This results in:
10:33:25.942611 IP osx.57900 > osx.8000 P 1:238(237) ack 1 win 65535
>nop,nop,timestamp 1076272988 1076272988>
0x0000: 4500 0121 f225 4000 4006 0000 7f00 0001 E..!.%@.@.......
0x0010: 7f00 0001 e22c 1f40 0fcb 74fc 46cf 148b .....,.@..t.F...
0x0020: 8018 ffff ff15 0000 0101 080a 4026 9f5c ............@&.\
0x0030: 4026 9f5c 504f 5354 202f 2048 5454 502f @&.\POST./.HTTP/
0x0040: 312e 300d 0a55 7365 722d 4167 656e 743a 1.0..
After a little conversation with the server, the payload is transmitted before the HTTP response, just like in the previous example.
What is the takeaway message from all of this? If you're using curl to test your REST interface, then make sure you are aware of the behavior HTTP 1.1 100 CONTINUE. You might notice it because your server receives a blank POST payload. Your HTML forms will appear to have not been filled in, even though you specified one or more -F arguments on the curl command line.
The solution for the versions of curl I've tested is to either remove the Expect header, or to tell curl to use HTTP 1.0 (since curl will default to 1.1 otherwise). Once again, here are those examples:
curl -H "Expect:" -F name=somevalue http://osx.example.com:8000
curl -0 -F name=somevalue http://osx.example.com:8000
This forces curl to POST the payload without waiting for the 100 CONTINUE response, and it is suitable for servers that don't know how to provide a 100 CONTINUE. I hope this helps someone out there to avoid the trouble I had debugging my REST interface.
The "right" way to handle this situation is to make sure your server will send a 100 CONTINUE. curl is smart enough, but your application might not be.
In a specific instance, my Django/Django-Piston/mod_wsgi application having trouble with the 100 CONTINUE when the client sets an Expect: 100-continue header. It turns out this was a problem with mod_wsgi 2.5 and the solution is to update to 3.0 (it's RC4 as of this post). Until I realized this was the deeper issue, I was able to get around the problem by preemptively POSTing the entire payload. This worked, but as I said before, it frequently resulted in duplicate uploads (like in 302 and 401 situations).
Before I started preemptively POSTing the whole payload, my application appeared to be receiving blank POST data, so it was responding with a 400 BAD REQUEST. In truth, the POST was blank, so it technically was a bad request. This is not curl's fault, though - it was just being extremely polite, waiting for a proper handshake before sending the payload.
Just watch out, because curl might be too polite - it didn't even tell me that my server was refusing the 100 CONTINUE handshake. This masked a much deeper problem with mod_wsgi, which took several days to sort out.
Freedom, glorious freedom.
Once upon a time, I took a class based on a single question: "what is freedom?" We meandered through US history, identifying several distinct stages in the evolution of the definition of "freedom." I was horrified to learn, during a discussion, that so many of my classmates wanted what I will call "freedom from information." Ah yes - Professor Sandage had a way of bringing the ugliest truths to the surface, for all to witness.
On the one hand, I can understand this desire for freedom from information: telemarketing, advertising, spam, the scrolling headlines at the bottom of a newscast... well, any unsolicited attempt at selling things you don't care about. On the other hand, I think we need more information instead of less, and we need effective tools to filter and manage that information so we only see what we care about.
The term "freedom" is muddied by historical contexts, but also through the process of etymological erosion. With that said, I want to take a moment to discuss the expression, "free as in speech, not beer."
"Free as in speech, not beer" is an expression that comes up in open source discussions all the time. It's a little hard to unpack, unless you really dig into the dual meaning of the word "free." Thanks to Wikipedia, we're part of the way there: the word "free" is used to mean two things: Gratis versus Libre. We call both of these terms "free" nowadays, but once upon a time, there were different words because they are totally different concepts. Gratis means "without charge" whereas Libre is more like "liberty" or "freedom."
So what is free speech? Of course, that's the freedom to say what you want (so long as you accept the consequences for what you've said). And free beer? Well, that would mean beer that is provided at no cost. I think the key is this: although you are free to say what you want, you could well end up in court for it (e.g. slander) and your expression won't come free of charge. On the flipside, you can provide beer free of charge, but not to someone who is 15 years old, so you may not freely provide beer to anyone you wish.
In other words, speech embodies Libre (but not necessarily Gratis) perfectly. Likewise, beer embodies Gratis very well, at the same time that beer is so closely regulated by many governments that it is hardly "libre." Nevertheless, everybody likes a good party with some beer pro gratis.
Speaking of free beer, the Free House is definitely not a place to find such a zero-cost beverage. For starters, the term Free House is mostly British, and always beer-related. It refers to a Public House (which you may know as a "pub") that will sell any kind of beer they can get people to buy. Contrast this with a Tied House, which sells beer manufactured by a single brewer, and you find that the Free House will have several brands on tap. Here, the term "Free" is more like Libre, and is used in the context of the "free market." ...and we all know that the free market isn't composed of things that are zero-cost.
When I was living in Berkeley, California there were two particularly good "Tied House" pubs that brewed and sold only their own brands of beer: Jupiter and Triple Rock. I should also mention Pyramid, which had a pretty cool restaurant with their own beverages on tap. This kind of pub is fun because they'll often have a sampler option to let you taste a small glass of everything they brew. It's a great way to experience the full spectrum of beers, but a word of advice: start with the lightest stuff and progress towards darker. The one exception to this rule is for hoppy beverages (e.g. IPA or APA), which might be light but which may have a pronounced bitter taste. You might want to close it off with an APA, even after drinking the stouts.
There's nothing that goes quite so well with open source software as a tasty hoppy beverage. I like pairing Stone Brewing Company's Arrogant Bastard with GnuPG, the open source implementation of Phil Zimmerman's PGP (pretty good privacy) software. Another favorite of mine is the Spaten Optimator paired with Wordpress. More recently, I've taken a liking to Unibroue, the French Canadian brewer, who offers such brews as Tres Pistoles, which is an excellent complement to Python. This last combination is probably the most dangerous of the group, because you might end up with excellent code, and you might end up with British comedy.
In the end of the day, free speech and free beer have a lot to do with open source software. You see, licenses such as the GNU General Public License actually permit developers to charge for their software, while simultaneously requiring all GPL software to be published with its source code. In this sense, the "free beer" part means the software isn't necessarily without cost, and the "free speech" part means you are required to publish the source code. In other words, the Libre aspect of the GPL has an important restriction: you are not free to not publish the source code, which in turn provides the most fundamental tenet of open source software: you are free to read and distribute the source code.
I want to hedge my previous statement: the GPL is a famous topic of debate, so there's plenty of room to criticize anyone who says anything - at all - about the GPL or about open source software, either according to the letter of the license, or according to the spirit of the movement.
Let me sum it up like this: "free" means many things to many people throughout many time-periods, but for some reason, it almost always comes down to a matter of speech and beer.