I’m working on a REST interface at the moment, and there’s nothing I need more than a quick utility to test out various functions.  Curl fills this role perfectly, but I have run into a strange problem that interferes with multipart/form-data form POSTing.  Let me explain some of the evidence I’ve collected, as well as tell you a workaround I learned from an IRC conversation.  In the end, this comes down to the HTTP 1.1 100 CONTINUE response code, which plays a critical role in HTTP 1.1 POST.

Configuration

For starters, I’m testing this out with OS X 10.5:

$ uname -a
Darwin osx.example.com 9.7.0 Darwin Kernel Version 9.7.0: Tue Mar 31 22:52:17
PDT 2009; root:xnu-1228.12.14~1/RELEASE_I386 i386 i386

For all of the output in this post, I used this version of curl:

$ curl --version
curl 7.19.6 (i386-apple-darwin9.7.0) libcurl/7.19.6 zlib/1.2.3
Protocols: tftp ftp telnet dict http file
Features: Largefile libz

Next up, we have the web server I was testing with:

$ curl -I osx.example.com:8000
HTTP/1.0 302 FOUND
Date: Fri, 18 Sep 2009 13:57:22 GMT
Server: WSGIServer/0.1 Python/2.6.2

I’ve been using wireshark and tcpdump to watch the traffic.  Here’s an example invocation of tcpdump that you can work with to replicate the issue:

sudo tcpdump -X -s 1500 -i lo0 tcp port 8000

Obviously, you might run a testing webserver on port 80, and you might send your traffic over lo, eth0, or en0.  If you’re reading this post, you probably know what’s what, and how to modify the command accordingly.

Issuing a simple multipart/form-data POST

It starts when I try to issue a multipart form POST:

curl -F name=somevalue http://osx.example.com:8000

This means “set the field called ‘name’ to ‘somevalue’ and instead of url encoding it, post it as a multipart MIME message.”  Your browser does this any time you upload a file to a website.  In my case, my REST API lets me upload PDF files, so curl needs to use multipart instead of url encoding for this purpose.

Curl only generates one TCP packet based on this command (even though it should generate multiple) and this is what that one packet looks like:

10:07:00.971856 IP osx.57777 > osx.8000: P 1:260(259) ack 1 win 65535
<nop,nop,timestamp 1076257225 1076257225>

0x0000:  4500 0137 03b9 4000 4006 0000 7f00 0001  E..7..@.@.......
0x0010:  7f00 0001 e1b1 1f40 3f35 c1d8 0bb4 4ba7  .......@?5....K.
0x0020:  8018 ffff ff2b 0000 0101 080a 4026 61c9  .....+......@&a.
0x0030:  4026 61c9 504f 5354 202f 2048 5454 502f  @&a.POST./.HTTP/
0x0040:  312e 310d 0a55 7365 722d 4167 656e 743a  1.1..User-Agent:
0x0050:  2063 7572 6c2f 372e 3139 2e36 2028 6933  .curl/7.19.6.(i3
0x0060:  3836 2d61 7070 6c65 2d64 6172 7769 6e39  86-apple-darwin9
0x0070:  2e37 2e30 2920 6c69 6263 7572 6c2f 372e  .7.0).libcurl/7.
0x0080:  3139 2e36 207a 6c69 622f 312e 322e 330d  19.6.zlib/1.2.3.
0x0090:  0a48 6f73 743a 2031 3237 2e30 2e30 2e31  .Host:.127.0.0.1
0x00a0:  3a38 3030 300d 0a41 6363 6570 743a 202a  :8000..Accept:.*
0x00b0:  2f2a 0d0a 436f 6e74 656e 742d 4c65 6e67  /*..Content-Leng
0x00c0:  7468 3a20 3134 380d 0a45 7870 6563 743a  th:.148..Expect:
0x00d0:  2031 3030 2d63 6f6e 7469 6e75 650d 0a43  .100-continue..C
0x00e0:  6f6e 7465 6e74 2d54 7970 653a 206d 756c  ontent-Type:.mul
0x00f0:  7469 7061 7274 2f66 6f72 6d2d 6461 7461  tipart/form-data
0x0100:  3b20 626f 756e 6461 7279 3d2d 2d2d 2d2d  ;.boundary=-----
0x0110:  2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d  ----------------
0x0120:  2d2d 2d2d 2d2d 2d37 3937 3037 6465 6438  -------79707ded8
0x0130:  6339 640d 0a0d 0a                        c9d....

We can tell a few things from this.  First, the packet is 260 bytes long, the HTTP Content-Length header indicates a forthcoming payload of 148 bytes, and curl has set the HTTP Expect header to:

Expect: 100-continue

Notice that it gets all the way up to the MIME boundary, which is fine.  The server responds with an ACK of the 260 bytes received, then sends back an HTTP reply:

10:07:00.971880 IP osx.8000 > osx.57777: . ack 260 win 65535 <nop,nop,
timestamp 1076257225 1076257225>

0x0000:  4500 0034 cda2 4000 4006 0000 7f00 0001  E..4..@.@.......
0x0010:  7f00 0001 1f40 e1b1 0bb4 4ba7 3f35 c2db  .....@....K.?5..
0x0020:  8010 ffff fe28 0000 0101 080a 4026 61c9  .....(......@&a.
0x0030:  4026 61c9                                @&a.

10:07:00.973365 IP osx.8000 > osx.57777: P 1:21(20) ack 260 win 65535
<nop,nop,timestamp 1076257225 1076257225>

0x0000:  4500 0048 1bbf 4000 4006 0000 7f00 0001  E..H..@.@.......
0x0010:  7f00 0001 1f40 e1b1 0bb4 4ba7 3f35 c2db  .....@....K.?5..
0x0020:  8018 ffff fe3c 0000 0101 080a 4026 61c9  .....<......@&a.
0x0030:  4026 61c9 4854 5450 2f31 2e30 2033 3032  @&a.HTTP/1.0.302
0x0040:  2046 4f55 4e44 0d0a                      .FOUND..

Ah!  The server didn’t respond with HTTP/1.1 100 CONTINUE.  Curl will wait until it receives a 100 CONTINUE before it sends its 148 byte payload.  If your server never sends that response, curl will never send the payload.  This can be a problem if your server or your application doesn’t know about this HTTP 1.1 behavior.

Here is the same communication, as seen from curl’s perspective with the -v flag (for verbose output):

$ curl -v -F name=somevalue http://127.0.0.1:8000
* About to connect() to 127.0.0.1 port 8000 (#0)
*   Trying 127.0.0.1... connected
* Connected to 127.0.0.1 (127.0.0.1) port 8000 (#0)
> POST / HTTP/1.1
> User-Agent: curl/7.19.6 (i386-apple-darwin9.7.0) libcurl/7.19.6 zlib/1.2.3
> Host: 127.0.0.1:8000
> Accept: */*
> Content-Length: 148
> Expect: 100-continue
> Content-Type: multipart/form-data; boundary=----------------------------d70cdce71857
>
* HTTP 1.0, assume close after body
> HTTP/1.0 302 FOUND
> Date: Fri, 18 Sep 2009 15:03:20 GMT
> Server: WSGIServer/0.1 Python/2.6.2
> Vary: Cookie
> Content-Type: text/html; charset=utf-8
> Location: http://127.0.0.1:8000/crm/form/login?next=/
>
* Closing connection #0

A brief note about HTTP 1.1

So, the fact that the server responds with a 302 FOUND instead of a 100 CONTINUE is really a feature of HTTP 1.1 because it is at this point that the conversation stops.  The 302 response will redirect the client to another resource on the server, so why not wait until you actually hit the resource that will receive your POST before you POST your payload?  If you’re POSTing a file to a URL redirect, then you will end up uploading your file at least twice, and that’s a waste of precious upstream bandwidth.

The HTTP/1.1 100 CONTINUE can potentially save you bandwidth, but read the coda at the end of this post for a cautionary tale.

Issuing a multipart/form-data POST without Expect

Let’s try this again without the Expect header:

curl -H "Expect:" -F name=somevalue http://osx.example.com:8000

The command above is identical to the previous one with the exception of the -H flag.  By setting “Expect:” to have no value after the colon, curl will interpret this as deleting the Expect header.  Sure enough, when we look at the TCP packet, the Expect header is gone:

10:11:59.308674 IP osx.57803 > osx.8000: P 1:238(237) ack 1 win 65535
>nop,nop,timestamp 1076260192 1076260192>

0x0000:  4500 0121 c4b5 4000 4006 0000 7f00 0001  E..!..@.@.......
0x0010:  7f00 0001 e1cb 1f40 5bb2 5099 2f1c 8bad  .......@[.P./...
0x0020:  8018 ffff ff15 0000 0101 080a 4026 6d60  ............@&m`
0x0030:  4026 6d60 504f 5354 202f 2048 5454 502f  @&m`POST./.HTTP/
0x0040:  312e 310d 0a55 7365 722d 4167 656e 743a  1.1..User-Agent:
0x0050:  2063 7572 6c2f 372e 3139 2e36 2028 6933  .curl/7.19.6.(i3
0x0060:  3836 2d61 7070 6c65 2d64 6172 7769 6e39  86-apple-darwin9
0x0070:  2e37 2e30 2920 6c69 6263 7572 6c2f 372e  .7.0).libcurl/7.
0x0080:  3139 2e36 207a 6c69 622f 312e 322e 330d  19.6.zlib/1.2.3.
0x0090:  0a48 6f73 743a 2031 3237 2e30 2e30 2e31  .Host:.127.0.0.1
0x00a0:  3a38 3030 300d 0a41 6363 6570 743a 202a  :8000..Accept:.*
0x00b0:  2f2a 0d0a 436f 6e74 656e 742d 4c65 6e67  /*..Content-Leng
0x00c0:  7468 3a20 3134 380d 0a43 6f6e 7465 6e74  th:.148..Content
0x00d0:  2d54 7970 653a 206d 756c 7469 7061 7274  -Type:.multipart
0x00e0:  2f66 6f72 6d2d 6461 7461 3b20 626f 756e  /form-data;.boun
0x00f0:  6461 7279 3d2d 2d2d 2d2d 2d2d 2d2d 2d2d  dary=-----------
0x0100:  2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d  ----------------
0x0110:  2d65 3932 3861 6430 3332 3262 340d 0a0d  -e928ad0322b4...
0x0120:  0a

The packet is now 238 bytes long, which is exactly right.  “Expect: 100-continue” is 20 bytes long, plus 2 bytes for \r\n (0d0a in hex), which accounts for the packet being 22 bytes shorter than the previous 260 byte packet.   As before, the content length is 148 bytes.  As before, the packet goes all the way up to the MIME boundary.

As before, the server sends back a TCP ACK of the 238 bytes received, but here’s the difference.  Critically, without the Expect header, curl sends the entire payload before the server responds:

10:11:59.308693 IP osx.8000 > osx.57803: . ack 238 win 65535 >nop,nop,
timestamp 1076260192 1076260192>

0x0000:  4500 0034 05cf 4000 4006 0000 7f00 0001  E..4..@.@.......
0x0010:  7f00 0001 1f40 e1cb 2f1c 8bad 5bb2 5186  .....@../...[.Q.
0x0020:  8010 ffff fe28 0000 0101 080a 4026 6d60  .....(......@&m`
0x0030:  4026 6d60                                @&m`

10:11:59.308751 IP osx.57803 > osx.8000 P 238:386(148) ack 1 win 65535
>nop,nop,timestamp 1076260192 1076260192>

0x0000:  4500 00c8 45bc 4000 4006 0000 7f00 0001  E...E.@.@.......
0x0010:  7f00 0001 e1cb 1f40 5bb2 5186 2f1c 8bad  .......@[.Q./...
0x0020:  8018 ffff febc 0000 0101 080a 4026 6d60  ............@&m`
0x0030:  4026 6d60 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d  @&m`------------
0x0040:  2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d  ----------------
0x0050:  2d2d 6539 3238 6164 3033 3232 6234 0d0a  --e928ad0322b4..
0x0060:  436f 6e74 656e 742d 4469 7370 6f73 6974  Content-Disposit
0x0070:  696f 6e3a 2066 6f72 6d2d 6461 7461 3b20  ion:.form-data;.
0x0080:  6e61 6d65 3d22 6e61 6d65 220d 0a0d 0a73  name=**"name"**....**s**
0x0090:  6f6d 6576 616c 7565 0d0a 2d2d 2d2d 2d2d  **omevalue**..------
0x00a0:  2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d  ----------------
0x00b0:  2d2d 2d2d 2d2d 2d2d 6539 3238 6164 3033  --------e928ad03
0x00c0:  3232 6234 2d2d 0d0a                      22b4--..

Whoah!  Did you notice the “name” … somevalue above?  That’s our payload, and it’s finally being transmitted.  This all happens before the server respond with an HTTP status code.  When sending the response, the TCP header is now an ACK to the total size of the POST-related packets (238 and 148, which is a total size of 386 bytes).

10:11:59.308768 IP osx.8000 > osx.57803: . ack 386 win 65535 >nop,nop,
timestamp 1076260192 1076260192>

0x0000:  4500 0034 d082 4000 4006 0000 7f00 0001  E..4..@.@.......
0x0010:  7f00 0001 1f40 e1cb 2f1c 8bad 5bb2 521a  .....@../...[.R.
0x0020:  8010 ffff fe28 0000 0101 080a 4026 6d60  .....(......@&m`
0x0030:  4026 6d60                                @&m`

10:11:59.316155 IP osx.8000 > osx.57803: P 1:21(20) ack 386 win 65535
>nop,nop,timestamp 1076260192 1076260192>

0x0000:  4500 0048 e1c5 4000 4006 0000 7f00 0001  E..H..@.@.......
0x0010:  7f00 0001 1f40 e1cb 2f1c 8bad 5bb2 521a  .....@../...[.R.
0x0020:  8018 ffff fe3c 0000 0101 080a 4026 6d60  .....>......@&m`
0x0030:  4026 6d60 4854 5450 2f31 2e30 2033 3032  @&m`HTTP/1.0.302
0x0040:  2046 4f55 4e44 0d0a                      .FOUND..

So removing the Expect header allowed curl to send an HTTP 1.1 POST, with its payload, before the server generated HTTP 1.0 302 FOUND.  The fact that the server responded with 302 FOUND means the entire POST data was ignored on the server side, but the client DID send it!  In other words, we just wasted some bandwidth, and we are going to need to POST the data at least one more time.  When curl was expecting an HTTP 1.1 100 CONTINUE instead, it never sends the rest of the payload, and curl never complains, not even with -v.

Issuing a multipart/form-data POST using HTTP 1.0

I’ll spare you the complete packet dump, but suffice to say that when curl is invoked with the HTTP 1.0 flag (-0, as in “dash zero”), it works just like when the Expect header is absent.  In other words, the following example also sends the payload before waiting for the server to respond.

curl -0 -F name=somevalue http://osx.example.com:8000

This results in:

10:33:25.942611 IP osx.57900 > osx.8000 P 1:238(237) ack 1 win 65535
>nop,nop,timestamp 1076272988 1076272988>

0x0000:  4500 0121 f225 4000 4006 0000 7f00 0001  E..!.%@.@.......
0x0010:  7f00 0001 e22c 1f40 0fcb 74fc 46cf 148b  .....,.@..t.F...
0x0020:  8018 ffff ff15 0000 0101 080a 4026 9f5c  ............@&.\
0x0030:  4026 9f5c 504f 5354 202f 2048 5454 502f  @&.\POST./.HTTP/
0x0040:  312e 300d 0a55 7365 722d 4167 656e 743a  1.0..

After a little conversation with the server, the payload is transmitted before the HTTP response, just like in the previous example.

Conclusion

What is the takeaway message from all of this?  If you’re using curl to test your REST interface, then make sure you are aware of the behavior HTTP 1.1 100 CONTINUE.  You might notice it because your server receives a blank POST payload.  Your HTML forms will appear to have not been filled in, even though you specified one or more -F arguments on the curl command line.

The solution for the versions of curl I’ve tested is to either remove the Expect header, or to tell curl to use HTTP 1.0 (since curl will default to 1.1 otherwise).  Once again, here are those examples:

curl -H "Expect:" -F name=somevalue http://osx.example.com:8000
curl -0 -F name=somevalue http://osx.example.com:8000

This forces curl to POST the payload without waiting for the 100 CONTINUE response, and it is suitable for servers that don’t know how to provide a 100 CONTINUE.  I hope this helps someone out there to avoid the trouble I had debugging my REST interface.

Coda: fix your server!

The “right” way to handle this situation is to make sure your server will send a 100 CONTINUE.  curl is smart enough, but your application might not be.

In a specific instance, my Django/Django-Piston/mod_wsgi application having trouble with the 100 CONTINUE when the client sets an Expect: 100-continue header. It turns out this was a problem with mod_wsgi 2.5 and the solution is to update to 3.0 (it’s RC4 as of this post). Until I realized this was the deeper issue, I was able to get around the problem by preemptively POSTing the entire payload.  This worked, but as I said before, it frequently resulted in duplicate uploads (like in 302 and 401 situations).

Before I started preemptively POSTing the whole payload, my application appeared to be receiving blank POST data, so it was responding with a 400 BAD REQUEST.   In truth, the POST was blank, so it technically was a bad request.  This is not curl’s fault, though - it was just being extremely polite, waiting for a proper handshake before sending the payload.

Just watch out, because curl might be too polite - it didn’t even tell me that my server was refusing the 100 CONTINUE handshake.  This masked a much deeper problem with mod_wsgi, which took several days to sort out.