Wherein we learn about how computers speak to each-other over a network.
Class presentations are available online for your use
https://github.com/UWPCE-PythonCert/training.python_web
Licensed with Creative Commons BY-NC-SA
Find mistakes? See improvements? Make a pull request.
The rendered documentation is available as well:
http://uwpce-pythoncert.github.io
Please check frequently. I will update with great regularity
Classroom Protocol
Questions to ask:
Classroom Protocol
Questions not to ask:
Introductions
The bottom layer is the 'Link Layer'
Moving up, we have the 'Internet Layer'
That's 4.3 x 10^28 addresses per person alive today
Next up is the 'Transport Layer'
The 'Transport Layer' also establishes the concept of a port
This means that you don't have to worry about information intended for your web browser being accidentally read by your email client.
There are certain ports which are commonly understood to belong to given applications or protocols:
These ports are often referred to as well-known ports
(see http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers)
Ports are grouped into a few different classes
The topmost layer is the 'Application Layer'
this is where we live and work
Think back for a second to what we just finished discussing, the TCP/IP stack.
A Socket is the software representation of that endpoint.
Opening a socket creates a kind of transceiver that can send and/or receive bytes at a given IP address and Port.
Python provides a standard library module which provides socket functionality. It is called socket.
The library is really just a very thin wrapper around the system implementation of BSD Sockets
Let's spend a few minutes getting to know this module.
We're going to do this next part together, so open up a terminal and start an iPython interpreter
The Python sockets library allows us to find out what port a service uses:
In [1]: import socket
In [2]: socket.getservbyname('ssh')
Out[2]: 22
You can also do a reverse lookup, finding what service uses a given port:
In [3]: socket.getservbyport(80)
Out[3]: 'http'
The sockets library also provides tools for finding out information about hosts. For example, you can find out about the hostname and IP address of the machine you are currently using:
In [4]: socket.gethostname()
Out[4]: 'Banks'
In [5]: socket.gethostbyname(socket.gethostname())
Out[5]: '127.0.0.1'
You can also find out about machines that are located elsewhere, assuming you know their hostname. For example:
In [6]: socket.gethostbyname('google.com')
Out[6]: '173.194.33.100'
In [7]: socket.gethostbyname('uw.edu')
Out[7]: '128.95.155.134'
In [8]: socket.gethostbyname('crisewing.com')
Out[8]: '108.168.213.86'
The gethostbyname_ex
method of the socket
library provides more
information about the machines we are exploring:
In [9]: socket.gethostbyname_ex('crisewing.com')
Out[9]: ('crisewing.com', [], ['108.168.213.86'])
In [10]: socket.gethostbyname_ex('google.com')
Out[10]:
('google.com',
[],
['173.194.33.100', '173.194.33.103',
...
'173.194.33.97', '173.194.33.104'])
To create a socket, you use the socket method of the socket
library.
It takes up to three optional positional arguments (here we use none to get
the default behavior):
In [11]: foo = socket.socket()
In [12]: foo
Out[12]: <socket.socket fd=10, family=AddressFamily.AF_INET,
type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 0)>
A socket has some properties that are immediately important to us. These include the family, type and protocol of the socket:
In [13]: foo.family
Out[13]: <AddressFamily.AF_INET: 2>
In [14]: foo.type
Out[14]: <SocketKind.SOCK_STREAM: 1>
In [15]: foo.proto
Out[15]: 0
You might notice that the values for these properties are integers. In fact, these integers are constants defined in the socket library.
Let's define a method in place to help us see these constants. It will take a single argument, the shared prefix for a defined set of constants:
(you can also find this in resources/session01/socket_tools.py
)
In [37]: def get_constants(prefix):
....: """mapping of socket module constants to their names"""
....: return {getattr(socket, n): n
....: for n in dir(socket)
....: if n.startswith(prefix)
....: }
....:
Think back a moment to our discussion of the Internet layer of the TCP/IP stack. There were a couple of different types of IP addresses:
The family of a socket corresponds to the addressing system it uses for connecting.
Families defined in the socket
library are prefixed by AF_
:
In [39]: families = get_constants('AF_')
In [40]: families
Out[40]:
{<AddressFamily.AF_UNSPEC: 0>: 'AF_UNSPEC',
<AddressFamily.AF_UNIX: 1>: 'AF_UNIX',
<AddressFamily.AF_INET: 2>: 'AF_INET',
...
<AddressFamily.AF_INET6: 30>: 'AF_INET6',
<AddressFamily.AF_SYSTEM: 32>: 'AF_SYSTEM'}
Your results may vary
Of all of these, the ones we care most about are 2
(IPv4) and 30
(IPv6).
When you are on a machine with an operating system that is Unix-like, you will
find another generally useful socket family: AF_UNIX
, or Unix Domain
Sockets. Sockets in this family:
What is the default family for the socket we created just a moment ago?
(remember we bound the socket to the symbol foo
)
How did you figure this out?
The socket type determines the semantics of socket communications.
Look up socket type constants with the SOCK_
prefix:
In [42]: types = get_constants('SOCK_')
In [43]: types
Out[43]:
{<SocketKind.SOCK_STREAM: 1>: 'SOCK_STREAM',
<SocketKind.SOCK_DGRAM: 2>: 'SOCK_DGRAM',
<SocketKind.SOCK_RAW: 3>: 'SOCK_RAW',
<SocketKind.SOCK_RDM: 4>: 'SOCK_RDM',
<SocketKind.SOCK_SEQPACKET: 5>: 'SOCK_SEQPACKET'}
The most common are 1
(Stream communication (TCP)) and 2
(Datagram
communication (UDP)).
What is the default type for our generic socket, foo
?
A socket also has a designated protocol. The constants for these are
prefixed by IPPROTO_
:
In [45]: protocols = get_constants('IPPROTO_')
In [46]: protocols
Out[46]:
{0: 'IPPROTO_IP',
...
6: 'IPPROTO_TCP',
...
17: 'IPPROTO_UDP',
...}
The choice of which protocol to use for a socket is determined by the
internet layer protocol you intend to use. TCP
? UDP
? ICMP
?
IGMP
?
What is the default protocol used by our generic socket, foo
?
These three properties of a socket correspond to the three positional arguments you may pass to the socket constructor.
Using them allows you to create sockets with specific communications profiles:
In [3]: socket.socket(socket.AF_INET,
...: socket.SOCK_DGRAM,
...: socket.IPPROTO_UDP)
Out[3]: <socket.socket fd=7,
family=AddressFamily.AF_INET,
type=SocketKind.SOCK_DGRAM,
proto=17,
laddr=('0.0.0.0', 0)>
So far we have:
When we return we'll learn how to find the communcations profiles of remote sockets, how to connect to them, and how to send and receive messages.
Take a few minutes now to clear your head (do not quit your python interpreter).
When you are creating a socket to communicate with a remote service, the remote socket will have a specific communications profile.
The local socket you create must match that communications profile.
How can you determine the correct values to use?
You ask.
The function socket.getaddrinfo
provides information about available
connections on a given host.
socket.getaddrinfo('127.0.0.1', 80)
This provides all you need to make a proper connection to a socket on a remote host. The value returned is a tuple of:
Again, let's create a utility method in-place so we can see this in action:
In [10]: def get_address_info(host, port):
....: for response in socket.getaddrinfo(host, port):
....: fam, typ, pro, nam, add = response
....: print('family: {}'.format(families[fam]))
....: print('type: {}'.format(types[typ]))
....: print('protocol: {}'.format(protocols[pro]))
....: print('canonical name: {}'.format(nam))
....: print('socket address: {}'.format(add))
....: print('')
....:
(you can also find this in resources/session01/socket_tools.py
)
Now, ask your own machine what possible connections are available for 'http':
In [11]: get_address_info(socket.gethostname(), 'http')
family: AF_INET
type: SOCK_DGRAM
protocol: IPPROTO_UDP
canonical name:
socket address: ('127.0.0.1', 80)
family: AF_INET
type: SOCK_STREAM
protocol: IPPROTO_TCP
canonical name:
socket address: ('127.0.0.1', 80)
What answers do you get?
In [12]: get_address_info('crisewing.com', 'http')
family: AF_INET
type: SOCK_DGRAM
protocol: IPPROTO_UDP
canonical name:
socket address: ('108.168.213.86', 80)
family: AF_INET
type: SOCK_STREAM
protocol: IPPROTO_TCP
canonical name:
socket address: ('108.168.213.86', 80)
Let's put this to use
We'll communicate with a remote server as a client
We've already made a socket foo
using the generic constructor without any
arguments. We can make a better one now by using real address information from
a real server online [do not type this yet]:
In [13]: streams = [info
....: for info in socket.getaddrinfo('crisewing.com', 'http')
....: if info[1] == socket.SOCK_STREAM]
....:
In [14]: streams
Out[14]:
[(<AddressFamily.AF_INET: 2>,
<SocketKind.SOCK_STREAM: 1>,
6,
'',
('108.168.213.86', 80))]
In [15]: info = streams[0]
In [16]: cewing_socket = socket.socket(*info[:3])
Once the socket is constructed with the appropriate family, type and protocol, we can connect it to the address of our remote server:
In [18]: cewing_socket.connect(info[-1])
None
Send a message to the server on the other end of our connection (we'll learn in session 2 about the message we are sending):
In [19]: msg = "GET / HTTP/1.1\r\n"
In [20]: msg += "Host: crisewing.com\r\n\r\n"
In [21]: msg = msg.encode('utf8')
In [22]: msg
Out[22]: b'GET / HTTP/1.1\r\nHost: crisewing.com\r\n\r\n'
In [23]: cewing_socket.sendall(msg)
None
One detail from the previous code should stand out:
In [21]: msg = msg.encode('utf8')
In [22]: msg
Out[22]: b'GET / HTTP/1.1\r\nHost: crisewing.com\r\n\r\n'
You can only send bytes through a socket, never unicode
In [35]: cewing_socket.sendall(msg.decode('utf8'))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-35-8178ec7f234d> in <module>()
----> 1 cewing_socket.sendall(msg.decode('utf8'))
TypeError: 'str' does not support the buffer interface
Whatever reply we get is received by the socket we created. We can read it back out (again, do not type this yet):
In [24]: response = cewing_socket.recv(4096)
In [25]: response[:60]
Out[25]: b'HTTP/1.1 200 OK\r\nServer: nginx\r\nDate: Sun, 20 Sep 2015 03:38'
buffer_size
(an integer). It should be a
power of 2 and smallish (~4096)buffer_size
(or smaller if less data was
received)buffer size
, you can call the method
repeatedly. The last bunch will be less than buffer size
.When you are finished with a connection, you should always close it:
cewing_socket.close()
First, connect and send a message:
In [55]: info = socket.getaddrinfo('crisewing.com', 'http')
In [56]: streams = [i for i in info if i[1] == socket.SOCK_STREAM]
In [57]: sock_info = streams[0]
In [58]: msg = "GET / HTTP/1.1\r\n"
In [59]: msg += "Host: crisewing.com\r\n\r\n"
In [60]: msg = msg.encode('utf8')
In [61]: cewing_socket = socket.socket(*sock_info[:3])
In [62]: cewing_socket.connect(sock_info[-1])
In [63]: cewing_socket.sendall(msg)
Then, receive a reply, iterating until it is complete:
In [65]: buffsize = 4096
In [66]: response = b''
In [67]: done = False
In [68]: while not done:
....: msg_part = cewing_socket.recv(buffsize)
....: if len(msg_part) < buffsize:
....: done = True
....: cewing_socket.close()
....: response += msg_part
....:
In [69]: len(response)
Out[69]: 19464
What about the other half of the equation?
Let's build a server and see how that part works.
For the moment, stop typing this into your interpreter.
Again, we begin by constructing a socket. Since we are actually the server this time, we get to choose family, type and protocol:
In [70]: server_socket = socket.socket(
....: socket.AF_INET,
....: socket.SOCK_STREAM,
....: socket.IPPROTO_TCP)
In [71]: server_socket
Out[71]: <socket.socket fd=12, family=AddressFamily.AF_INET,
type=SocketKind.SOCK_STREAM, proto=6, laddr=('0.0.0.0', 0)>
Our server socket needs to be bound to an address. This is the IP Address and Port to which clients must connect:
In [72]: address = ('127.0.0.1', 50000)
In [73]: server_socket.bind(address)
Terminology Note: In a server/client relationship, the server binds to an address and port. The client connects
Once our socket is bound to an address, we can listen for attempted connections:
In [74]: server_socket.listen(1)
listen
is the backlogWhen a socket is listening, it can receive incoming connection requests:
In [75]: connection, client_address = server_socket.accept()
socket.accept()
is a blocking call. It will not return
values until a client connectsconnection
returned by a call to accept
is a new socket.
This new socket is used to communicate with the clientclient_address
is a two-tuple of IP Address and Port for the client
socketThe connection
socket can now be used to receive messages from the client
which made the connection:
In [76]: connection.recv(buffsize)
It may also be used to return a reply:
In [77]: connection.sendall("message received")
Once a transaction between the client and server is complete, the
connection
socket should be closed:
In [78]: connection.close()
At this point, the server_socket
can again accept a new client
connection.
Note that the server_socket
is never closed as long as the server
continues to run.
The flow of this interaction can be a bit confusing. Let's see it in action step-by-step.
In your first python interpreter, create a server socket and prepare it for connections:
In [81]: server_socket = socket.socket(
....: socket.AF_INET,
....: socket.SOCK_STREAM,
....: socket.IPPROTO_IP)
In [82]: server_socket.bind(('127.0.0.1', 50000))
In [83]: server_socket.listen(1)
In [84]: conn, addr = server_socket.accept()
At this point, you should not get back a prompt. The server socket is waiting for a connection to be made.
In your second interpreter, create a client socket and prepare to send a message:
In [1]: import socket
In [2]: client_socket = socket.socket(
...: socket.AF_INET,
...: socket.SOCK_STREAM,
...: socket.IPPROTO_IP)
Before connecting, keep your eye on the server interpreter:
In [3]: client_socket.connect(('127.0.0.1', 50000))
As soon as you made the connection above, you should have seen the prompt
return in your server interpreter. The accept
method finally returned a
new connection socket.
When you're ready, type the following in the client interpreter:
In [4]: client_socket.sendall('Hey, can you hear me?'.encode('utf8'))
Back in your server interpreter, go ahead and receive the message from your client:
In [87]: msg = conn.recv(4096)
In [88]: msg
Out[88]: b'Hey, can you hear me?'
Send a message back, and then close up your connection:
In [89]: conn.sendall('Yes, I can hear you.'.encode('utf8'))
In [90]: conn.close()
Back in your client interpreter, take a look at the response to your message, then be sure to close your client socket too:
In [5]: from_server = client_socket.recv(4096)
In [6]: from_server
Out[6]: b'Yes, I can hear you.'
In [7]: client_socket.close()
And now that we're done, we can close up the server socket too (back in the server interpreter):
In [91]: server_socket.close()
You've run your first client-server interaction
Your homework assignment for this week is to take what you've learned here and build a simple "echo" server.
The server should automatically return to any client that connects exactly what it receives (it should echo all messages).
You will also write a python script that, when run, will send a message to the
server and receive the reply, printing it to stdout
.
Finally, you'll do all of this so that it can be tested.
In our class repository, there is a folder resources/session01
.
Inside that folder, you should find:
tasks.txt
that contains these instructionsecho_server.py
echo_client.py
tests.py
Your task is to make the tests pass.
To run the tests, you'll have to set the server running in one terminal:
$ python echo_server.py
Then, in a second terminal, you will execute the tests:
$ python tests.py
You should see output like this:
[...]
FAILED (failures=2)
To submit your homework:
echo_sockets
.echo_server.py
, echo_client.py
and tests.py
files in
this repository.We will clone your repository and run the tests as described above.
And we'll make comments inline on your repository.
In resources/session01/tasks.txt
you'll find a few extra problems to try.
If you finish the first part of the homework in less than 3-4 hours give one or more of these a whirl.
They are not required, but if you include solutions in your repository, we'll review your work.