Programming languages like Python are the languages we speak to computers.
Protocols are the languages that computers speak to each-other.
This sesson we'll look at a few of them and
Questions from the Homework?
Examples of an echo server using select
a set of rules or conventions
governing communications
Life has lots of sets of rules for how to do things.
Digital life has lots of rules too:
What does this look like in practice?
Over the next few slides we'll be looking at server/client interactions.
Each interaction is line-based, each line represents one message.
Messages from the Server to the Client are prefaced with S (<--)
Messages from the Client to the Server are prefaced with C (-->)
All lines end with the character sequence <CRLF>
(\r\n
)
What does SMTP look like?
SMTP (Say hello and identify yourself):
S (<--): 220 foo.com Simple Mail Transfer Service Ready
C (-->): EHLO bar.com
S (<--): 250-foo.com greets bar.com
S (<--): 250-8BITMIME
S (<--): 250-SIZE
S (<--): 250-DSN
S (<--): 250 HELP
What does SMTP look like?
SMTP (Ask for information, provide answers):
C (-->): MAIL FROM:<Smith@bar.com>
S (<--): 250 OK
C (-->): RCPT TO:<Jones@foo.com>
S (<--): 250 OK
C (-->): RCPT TO:<Green@foo.com>
S (<--): 550 No such user here
C (-->): DATA
S (<--): 354 Start mail input; end with <CRLF>.<CRLF>
C (-->): Blah blah blah...
C (-->): ...etc. etc. etc.
C (-->): .
S (<--): 250 OK
What does SMTP look like?
SMTP (Say goodbye):
C (-->): QUIT
S (<--): 221 foo.com Service closing transmission channel
250
reply to EHLO
above)What does POP3 look like?
POP3 (Say hello and identify yourself):
C (-->): <client connects to service port 110>
S (<--): +OK POP3 server ready <1896.6971@mailgate.dobbs.org>
C (-->): USER bob
S (<--): +OK bob
C (-->): PASS redqueen
S (<--): +OK bob's maildrop has 2 messages (320 octets)
What does POP3 look like?
POP3 (Ask for information, provide answers):
C (-->): STAT
S (<--): +OK 2 320
C (-->): LIST
S (<--): +OK 1 messages (120 octets)
S (<--): 1 120
S (<--): .
What does POP3 look like?
POP3 (Ask for information, provide answers):
C (-->): RETR 1
S (<--): +OK 120 octets
S (<--): <server sends the text of message 1>
S (<--): .
C (-->): DELE 1
S (<--): +OK message 1 deleted
What does POP3 look like?
POP3 (Say goodbye):
C (-->): QUIT
S (<--): +OK dewey POP3 server signing off (maildrop empty)
C (-->): <client hangs up>
The codes don't really look the same, though, do they?
The exception to the one-line-per-message rule is payload
In both SMTP and POP3 this is terminated by <CRLF>.<CRLF>
In SMTP, the client has this ability
But in POP3, it belongs to the server.
Why?
What does IMAP look like?
IMAP (Say hello and identify yourself):
C (-->): <client connects to service port 143>
S (<--): * OK example.com IMAP4rev1 v12.264 server ready
C (-->): A0001 USER "frobozz" "xyzzy"
S (<--): * OK User frobozz authenticated
What does IMAP look like?
IMAP (Ask for information, provide answers [connect to an inbox]):
C (-->): A0002 SELECT INBOX
S (<--): * 1 EXISTS
S (<--): * 1 RECENT
S (<--): * FLAGS (\Answered \Flagged \Deleted \Draft \Seen)
S (<--): * OK [UNSEEN 1] first unseen message in /var/spool/mail/esr
S (<--): A0002 OK [READ-WRITE] SELECT completed
What does IMAP look like?
IMAP (Ask for information, provide answers [Get message sizes]):
C (-->): A0003 FETCH 1 RFC822.SIZE
S (<--): * 1 FETCH (RFC822.SIZE 2545)
S (<--): A0003 OK FETCH completed
What does IMAP look like?
IMAP (Ask for information, provide answers [Get first message header]):
C (-->): A0004 FETCH 1 BODY[HEADER]
S (<--): * 1 FETCH (RFC822.HEADER {1425}
<server sends 1425 octets of message payload>
S (<--): )
S (<--): A0004 OK FETCH completed
What does IMAP look like?
IMAP (Ask for information, provide answers [Get first message body]):
C (-->): A0005 FETCH 1 BODY[TEXT]
S (<--): * 1 FETCH (BODY[TEXT] {1120}
<server sends 1120 octets of message payload>
S (<--): )
S (<--): * 1 FETCH (FLAGS (\Recent \Seen))
S (<--): A0005 OK FETCH completed
What does IMAP look like?
IMAP (Say goodbye):
C (-->): A0006 LOGOUT
S (<--): * BYE example.com IMAP4rev1 server terminating connection
S (<--): A0006 OK LOGOUT completed
C (-->): <client hangs up>
Compared with POP3, what do these differences suggest?
Let's try this out for ourselves!
Begin by importing the imaplib
module from the Python Standard Library:
In [1]: import imaplib
In [2]: dir(imaplib)
Out[2]:
['AllowedVersions',
'CRLF',
'Commands',
...
'timedelta',
'timezone']
In [3]: imaplib.Debug = 4
Setting imap.Debug
shows us what is sent and received
I've prepared a server for us to use, but we'll need to set up a client to speak to it.
Our server requires SSL (Secure Socket Layer) for connecting to IMAP servers, so let's initialize an IMAP4_SSL client and authenticate:
In [4]: conn = imaplib.IMAP4_SSL('mail.webfaction.com')
22:40.32 imaplib version 2.58
22:40.32 new IMAP4 connection, tag=b'IMKC'
22:40.38 < b'* OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE AUTH=PLAIN] Dovecot ready.'
22:40.38 > b'IMKC0 CAPABILITY'
22:40.45 < b'* CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE AUTH=PLAIN'
22:40.45 < b'IMKC0 OK Capability completed.'
22:40.45 CAPABILITIES: ('IMAP4REV1', 'LITERAL+', 'SASL-IR', 'LOGIN-REFERRALS', 'ID', 'ENABLE', 'IDLE', 'AUTH=PLAIN')
In [5]: conn.login('crisewing_demobox', 's00p3rs3cr3t')
22:59.92 > b'IMKC1 LOGIN crisewing_demobox "s00p3rs3cr3t"'
23:01.79 < b'* CAPABILITY IMAP4rev1 SASL-IR SORT THREAD=REFERENCES MULTIAPPEND UNSELECT LITERAL+ IDLE CHILDREN NAMESPACE LOGIN-REFERRALS STARTTLS AUTH=PLAIN'
23:01.79 < b'IMKC1 OK Logged in.'
Out[5]: ('OK', [b'Logged in.'])
We can start by listing the mailboxes we have on the server:
In [6]: conn.list()
26:30.64 > b'IMKC2 LIST "" *'
26:30.72 < b'* LIST (\\HasNoChildren) "." "Trash"'
26:30.72 < b'* LIST (\\HasNoChildren) "." "Drafts"'
26:30.72 < b'* LIST (\\HasNoChildren) "." "Sent"'
26:30.72 < b'* LIST (\\HasNoChildren) "." "Junk"'
26:30.72 < b'* LIST (\\HasNoChildren) "." "INBOX"'
26:30.72 < b'IMKC2 OK List completed.'
Out[6]:
('OK',
[b'(\\HasNoChildren) "." "Trash"',
b'(\\HasNoChildren) "." "Drafts"',
b'(\\HasNoChildren) "." "Sent"',
b'(\\HasNoChildren) "." "Junk"',
b'(\\HasNoChildren) "." "INBOX"'])
To interact with our email, we must select a mailbox from the list we received earlier:
In [7]: conn.select('INBOX')
27:20.96 > b'IMKC3 SELECT INBOX'
27:21.04 < b'* FLAGS (\\Answered \\Flagged \\Deleted \\Seen \\Draft)'
27:21.04 < b'* OK [PERMANENTFLAGS (\\Answered \\Flagged \\Deleted \\Seen \\Draft \\*)] Flags permitted.'
27:21.04 < b'* 1 EXISTS'
27:21.04 < b'* 0 RECENT'
27:21.04 < b'* OK [UNSEEN 1] First unseen.'
27:21.04 < b'* OK [UIDVALIDITY 1357449499] UIDs valid'
27:21.04 < b'* OK [UIDNEXT 24] Predicted next UID'
27:21.04 < b'IMKC3 OK [READ-WRITE] Select completed.'
Out[7]: ('OK', [b'1'])
We can search our selected mailbox for messages matching one or more criteria.
The return value is a list of bytestrings containing the UIDs of messages that match our search:
In [8]: conn.search(None, '(FROM "cris")')
28:43.02 > b'IMKC4 SEARCH (FROM "cris")'
28:43.09 < b'* SEARCH 1'
28:43.09 < b'IMKC4 OK Search completed.'
Out[8]: ('OK', [b'1'])
Once we've found a message we want to look at, we can use the fetch
command to read it from the server.
IMAP allows fetching each part of a message independently:
In [9]: conn.fetch('1', 'BODY[HEADER]')
...
Out[9]: ('OK', ...)
In [10]: conn.fetch('1', 'FLAGS')
...
Out[10]: ('OK', [b'1 (FLAGS (\\Seen))'])
In [11]: conn.fetch('1', 'BODY[TEXT]')
...
Out[11]: ('OK', ...)
What does the message say?
Python even includes an email library that would allow us to interact with this message in an OO style.
But in every case we've seen, we could do the same thing with a socket and some strings
Let's take a few minutes here to clear our heads.
When we return, we'll learn about the king of protocols,
HTTP
HTTP is no different
HTTP is also message-centered, with two-way communications:
HTTP (Ask for information):
GET /index.html HTTP/1.1<CRLF>
Host: www.example.com<CRLF>
<CRLF>
note: the <CRLF>
you see here is a visualization of the \r\n
character sequence.
HTTP (Provide answers):
HTTP/1.1 200 OK
Date: Mon, 23 May 2005 22:38:34 GMT
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
Etag: "3f80f-1b6-3e1cb03b"
Accept-Ranges: none
Content-Length: 438
Connection: close
Content-Type: text/html; charset=UTF-8
<CRLF>
<!DOCTYPE html>\n<html>\n <head>\n <title>This is a .... </html>
Pay particular attention to the <CRLF>
on a line by itself.
In HTTP, both request and response share a common basic format:
Let's investigate the HTTP protocol a bit in real life.
We'll do so by building a simplified HTTP server, one step at a time.
There is a copy of the echo server from last time in
resources/session02
. It's called http_server.py
.
In a terminal, move into that directory. We'll be doing our work here for the rest of the session
Test Driven Development (TDD) is all the rage these days.
It means that before you write code, you first write tests demonstrating what you want your code to do.
When all your tests pass, you are finished. You did this for your last assignment.
We'll be doing it again today.
From inside resources/session02
start a second python interpreter and run
$ python http_server.py
In your first interpreter run the tests. You should see similar output:
$ python tests.py
[...]
Ran 10 tests in 0.054s
FAILED (failures=3, errors=7)
Let's take a few minutes here to look at these tests and understand them.
Our job is to make all those tests pass.
First, though, let's pretend this server really is a functional HTTP server.
This time, instead of using the echo client to make a connection to the server, let's use a web browser!
Point your favorite browser at http://localhost:10000
First, look at the printed output from your echo server.
Second, note that your browser is still waiting to finish loading the page
Moreover, your server should also be hung, waiting for more from the 'client'
This is because the server is waiting for the browser to respond
And at the same time, the browser is waiting for the server to indicate it is done.
Our server does not yet speak the HTTP protocol, but the browser is expecting it.
Kill your server with ctrl-c
(the keyboard interrupt) and you should see
some printed content in your browser:
GET / HTTP/1.1
Host: localhost:10000
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:22.0) Gecko/20100101 Firefox/22.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Cookie: __utma=111872281.383966302.1364503233.1364503233.1364503233.1; __utmz=111872281.1364503233.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); csrftoken=uiqj579iGRbReBHmJQNTH8PFfAz2qRJS
Connection: keep-alive
Cache-Control: max-age=0
Your server is simply echoing what it receives, so this is an HTTP Request as sent by your browser.
When working on HTTP applications, it's nice to be able to see all this going back and forth.
Good browsers support this with a set of developer tools built-in.
The 'Net(work)' pane of these tools can show you both request and response, headers and all. Very useful.
Let's take a quick look
Sometimes you need or want to debug http requests that are not going through your browser.
Or perhaps you need functionality that is not supported by in-browser tools (request munging, header mangling, decryption of https request/responses)
Then it might be time for an HTTP debugging proxy:
We won't cover any of these tools here today. But you can check them out when you have the time.
In HTTP 1.0, the only required line in an HTTP request is this:
GET /path/to/index.html HTTP/1.0<CRLF>
<CRLF>
As virtual hosting grew more common, that was not enough, so HTTP 1.1 adds a single required header, Host:
GET /path/to/index.html HTTP/1.1<CRLF>
Host: www.mysite1.com:80<CRLF>
<CRLF>
In both HTTP 1.0 and 1.1, a proper response consists of an intial line, followed by optional headers, a single blank line, and then optionally a response body:
HTTP/1.1 200 OK<CRLF>
Content-Type: text/plain<CRLF>
<CRLF>
this is a pretty minimal response
Let's update our server to return such a response.
Begin by implementing a new function in your http_server.py
script called
response_ok.
It can be super-simple for now. We'll improve it later.
It needs to return our minimal response from above:
HTTP/1.1 200 OK<CRLF>
Content-Type: text/plain<CRLF>
<CRLF>
this is a pretty minimal response
Remember, <CRLF> is a placeholder for the \r\n
character sequence
def response_ok():
"""returns a basic HTTP response"""
resp = []
resp.append(b"HTTP/1.1 200 OK")
resp.append(b"Content-Type: text/plain")
resp.append(b"")
resp.append(b"this is a pretty minimal response")
return b"\r\n".join(resp)
Did you remember that sockets only accept bytes?
We've now implemented a function that is tested by our tests. Let's run them again:
$ python tests.py
[...]
----------------------------------------------------------------------
Ran 10 tests in 0.002s
FAILED (failures=3, errors=3)
Great! We've now got 4 tests that pass. Good work.
Next, we need to rebuild the server loop from our echo server for it's new purpose:
It should now wait for an incoming request to be finished, then send a response back to the client.
The response it sends can be the result of calling our new response_ok
function for now.
We could also bump up the recv
buffer size to something more reasonable
for HTTP traffic, say 1024.
# ...
try:
while True:
print('waiting for a connection', file=log_buffer)
conn, addr = sock.accept() # blocking
try:
print('connection - {0}:{1}'.format(*addr), file=log_buffer)
while True:
data = conn.recv(1024)
if len(data) < 1024:
break
print('sending response', file=log_buffer)
response = response_ok()
conn.sendall(response)
finally:
conn.close()
# ...
Once you've got that set, restart your server:
$ python http_server.py
Then you can re-run your tests:
$ python tests.py
[...]
----------------------------------------------------------------------
Ran 10 tests in 0.003s
FAILED (failures=2, errors=3)
Five tests now pass!
Every HTTP request must begin with a single line, broken by whitespace into three parts:
GET /path/to/index.html HTTP/1.1
The three parts are the method, the URI, and the protocol
Let's look at each in turn.
GET /path/to/index.html HTTP/1.1
These four methods are mapped to the four basic steps (CRUD) of persistent storage:
HTTP methods can be categorized as safe or unsafe, based on whether they might change something on the server:
This is a normative distinction, which is to say be careful
HTTP methods can be categorized as idempotent.
This means that a given request will always have the same result:
Again, normative. The developer is responsible for ensuring that it is true.
Let's keep things simple, our server will only respond to GET requests.
We need to create a function that parses a request and determines if we can
respond to it: parse_request
.
If the request method is not GET, our method should raise an error
Remember, although a request is more than one line long, all we care about here is the first line
def parse_request(request):
first_line = request.split("\r\n", 1)[0]
method, uri, protocol = first_line.split()
if method != "GET":
raise NotImplementedError("We only accept GET")
print('request is okay', file=sys.stderr)
We'll also need to update the server code. It should
# ...
conn, addr = sock.accept() # blocking
try:
print('connection - {0}:{1}'.format(*addr), file=log_buffer)
request = ""
while True:
data = conn.recv(1024)
request += data.decode('utf8')
if len(data) < 1024 or not data:
break
parse_request(request)
print('sending response', file=log_buffer)
response = response_ok()
conn.sendall(response)
finally:
conn.close()
# ...
Quit and restart your server now that you've updated the code:
$ python http_server.py
At this point, we should have seven tests passing:
$ python tests.py
Ran 10 tests in 0.002s
FAILED (failures=1, errors=2)
The server quit during the tests, but an HTTP request from the browser should work fine now.
Restart the server and reload your browser. You should see your OK response.
We can use the simple_client.py
script in our resources to test our
error condition. In a second terminal window run the script like so:
$ python simple_client.py "POST / HTTP/1.0\r\n\r\n"
This should cause the server to crash.
Okay, so the outcome there was pretty ugly. The client went off the rails, and our server has terminated as well.
why?
The HTTP protocol allows us to handle errors like this more gracefully.
Enter the Response Code
HTTP/1.1
200 OK
All HTTP responses must include a response code indicating the outcome of the request.
The text bit makes the code more human-readable
There are certain HTTP response codes you are likely to see (and use) most often:
200 OK
- Everything is good301 Moved Permanently
- You should update your link304 Not Modified
- You should load this from cache404 Not Found
- You've asked for something that doesn't exist500 Internal Server Error
- Something bad happenedDo not be afraid to use other, less common codes in building good apps. There are a lot of them for a reason.
Luckily, there's an error code that is tailor-made for this situation.
The client has made a request using a method we do not support
405 Method Not Allowed
Let's add a new function that returns this error code. It should be called
response_method_not_allowed
Remember, it must be a complete HTTP Response with the correct code
def response_method_not_allowed():
"""returns a 405 Method Not Allowed response"""
resp = []
resp.append(b"HTTP/1.1 405 Method Not Allowed")
resp.append(b"")
return b"\r\n".join(resp)
Again, we'll need to update the server to handle this error condition correctly. It should
parse_request
function# ...
while True:
data = conn.recv(1024)
request += data.decode('utf8')
if len(data) < 1024:
break
try:
parse_request(request)
except NotImplementedError:
response = response_method_not_allowed()
else:
response = response_ok()
print('sending response', file=log_buffer)
conn.sendall(response)
# ...
Start your server (or restart it if by some miracle it's still going).
Then run the tests again:
$ python tests.py
[...]
Ran 10 tests in 0.002s
OK
Wahoo! All our tests are passing. That means we are done writing code for now.
We've got a very simple server that accepts a request and sends a response. But what happens if we make a different request?
In your web browser, enter the following URL:
http://localhost:10000/page
What happened? What happens if you use this URL:
http://localhost:10000/section/page?
We expect different urls to result in different responses.
Each separate path provided should map to a resource
But this isn't happening with our server, for obvious reasons.
It brings us back to the second element of that first line of an HTTP request.
The Return of the URI
GET
/path/to/index.html HTTP/1.1
Our parse_request
method actually already finds the uri
in the first
line of a request
All we need to do is update the method so that it returns that uri
Then we can use it.
def parse_request(request):
first_line = request.split("\r\n", 1)[0]
method, uri, protocol = first_line.split()
if method != "GET":
raise NotImplementedError("We only accept GET")
print >>sys.stderr, 'request is okay'
# add the following line:
return uri
Now we can update our server code so that it uses the return value of
parse_request
.
That's a pretty simple change:
try:
uri = parse_request(request) # update this line
except NotImplementedError:
response = response_method_not_allowed()
else:
# and modify this block
try:
content, mime_type = resolve_uri(url)
except NameError:
response = response_not_found()
else:
response = response_ok(content, mime_type)
You may have noticed that we just added calls to functions that don't yet exist
It's a program that shows you what you want to do, but won't actually run.
For your homework this week you will create these functions, completing the HTTP server.
Your starting point will be what we've made here in class.
I've added a directory to resources/session02
called homework
.
In it, you'll find this http_server.py
file we've just written in
class.
That file also contains enough stub code for the missing functions to let the server run.
And there are more tests for you to make pass!
Take the following steps one at a time. Run the tests in
assignments/session02/homework
between to ensure that you are getting it
right.
resolve_uri
function so that it handles looking up
resources on disk using the URI returned by parse_request
.response_not_found
function stub so that it returns a 404
response.response_ok
so that it uses the values returned by resolve_uri
by the URI. (these have already been added to the function signature)Along the way, you'll discover that simply returning the content of a file as an HTTP response body is insufficient. Different types of content need to be identified to your browser
We can fix this by passing information about exactly what we are returning as part of the response.
HTTP provides for this type of thing with the generic idea of Headers
Both requests and responses can contain headers of the form Name: Value
read more about HTTP headers: http://www.cs.tut.fi/~jkorpela/http.html
A very common header used in HTTP responses is Content-Type
. It tells the
client what to expect.
Content-Type: image/jpeg
Content-Type: image/png
Content-Type: text/plain
Content-Type: text/html
There are many mime-type identifiers: http://www.freeformatter.com/mime-types-list.html
By mapping a given file to a mime-type, we can write a header.
The standard lib module mimetypes
does just this.
We can guess the mime-type of a file based on the filename or map a file extension to a type:
>>> import mimetypes
>>> mimetypes.guess_type('file.txt')
('text/plain', None)
>>> mimetypes.types_map['.txt']
'text/plain'
Your resolve_uri
function will need to accomplish the following tasks:
text/plain
.One of the benefits of test-driven development is that the tests that are failing should tell you what code you need to write.
As you work your way through the steps outlined above, look at your tests. Write code that makes them pass.
If all the tests in assignments/session02/tests.py
are passing, you've
completed the assignment.
To submit your homework:
assignments/session02
directory of your fork of
the class respositoryI will review your work when I receive your pull requests, make comments on it there, and then close the pull request.
If you are able to finish the above in less than 4-6 hours, consider taking on one or more of the following challenges:
Date:
header in the proper format (RFC-1123) to responses.
hint: see email.utils.formatdate in the python standard libraryContent-Length:
header for OK
responses that provides a
correct value.500 Internal Server Error
response.webroot
as plain text, execute
the file and return the results as HTML.