image: http://en.wikipedia.org/wiki/Internet_Protocol_Suite
image: http://en.wikipedia.org/wiki/Internet_Protocol_Suite
The bottom layer is the ‘Link Layer’
Moving up, we have the ‘Internet Layer’
Next up is the ‘Transport Layer’
The ‘Transport Layer’ also establishes the concept of a port
This means that you don’t have to worry about information intended for your web browser being accidentally read by your email client.
There are certain ports which are commonly understood to belong to given applications or protocols:
These ports are often referred to as well-known ports
(see http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers)
Ports are grouped into a few different classes
The topmost layer is the ‘Application Layer’
this is where we live and work
Think back for a second to what we just finished discussing, the TCP/IP stack.
A Socket is the software representation of that endpoint.
Opening a socket creates a kind of transceiver that can send and/or receive bytes at a given IP address and Port.
Python provides a standard library module which provides socket functionality. It is called socket.
The library is really just a very thin wrapper around the system implementation of BSD Sockets
Let’s spend a few minutes getting to know this module.
We’re going to do this next part together, so open up a terminal and start a python interpreter
The Python sockets library allows us to find out what port a service uses:
>>> import socket
>>> socket.getservbyname('ssh')
22
You can also do a reverse lookup, finding what service uses a given port: small
>>> socket.getservbyport(80)
'http'
The sockets library also provides tools for finding out information about hosts. For example, you can find out about the hostname and IP address of the machine you are currently using:
>>> socket.gethostname()
'heffalump.local'
>>> socket.gethostbyname(socket.gethostname())
'10.211.55.2'
You can also find out about machines that are located elsewhere, assuming you know their hostname. For example:
>>> socket.gethostbyname('google.com')
'173.194.33.4'
>>> socket.gethostbyname('uw.edu')
'128.95.155.135'
>>> socket.gethostbyname('crisewing.com')
'108.59.11.99'
The gethostbyname_ex method of the socket library provides more information about the machines we are exploring:
>>> socket.gethostbyname_ex('google.com')
('google.com', [], ['173.194.33.9', '173.194.33.14',
...
'173.194.33.6', '173.194.33.7',
'173.194.33.8'])
>>> socket.gethostbyname_ex('crisewing.com')
('crisewing.com', [], ['108.59.11.99'])
>>> socket.gethostbyname_ex('www.rad.washington.edu')
('elladan.rad.washington.edu', # <- canonical hostname
['www.rad.washington.edu'], # <- any machine aliases
['128.95.247.84']) # <- all active IP addresses
To create a socket, you use the socket method of the socket library. It takes up to three optional positional arguments (here we use none to get the default behavior):
>>> foo = socket.socket()
>>> foo
<socket._socketobject object at 0x10046cec0>
A socket has some properties that are immediately important to us. These include the family, type and protocol of the socket:
>>> foo.family
2
>>> foo.type
1
>>> foo.proto
0
You might notice that the values for these properties are integers. In fact, these integers are constants defined in the socket library.
Let’s define a method in place to help us see these constants. It will take a single argument, the shared prefix for a defined set of constants:
>>> def get_constants(prefix):
... """mapping of socket module constants to their names."""
... return dict(
... (getattr(socket, n), n)
... for n in dir(socket)
... if n.startswith(prefix)
... )
...
>>>
Think back a moment to our discussion of the Internet layer of the TCP/IP stack. There were a couple of different types of IP addresses:
The family of a socket corresponds to the addressing system it uses for connecting.
Families defined in the socket library are prefixed by AF_:
>>> families = get_constants('AF_')
>>> families
{0: 'AF_UNSPEC', 1: 'AF_UNIX', 2: 'AF_INET',
11: 'AF_SNA', 12: 'AF_DECnet', 16: 'AF_APPLETALK',
17: 'AF_ROUTE', 23: 'AF_IPX', 30: 'AF_INET6'}
Your results may vary
Of all of these, the ones we care most about are 2 (IPv4) and 30 (IPv6).
When you are on a machine with an operating system that is Unix-like, you will find another generally useful socket family: AF_UNIX, or Unix Domain Sockets. Sockets in this family:
What is the default family for the socket we created just a moment ago?
(remember we bound the socket to the symbol foo) center
How did you figure this out?
The socket type determines the semantics of socket communications.
Look up socket type constants with the SOCK_ prefix:
>>> types = get_constants('SOCK_')
>>> types
{1: 'SOCK_STREAM', 2: 'SOCK_DGRAM',
...}
The most common are 1 (Stream communication (TCP)) and 2 (Datagram communication (UDP)).
What is the default type for our generic socket, foo?
A socket also has a designated protocol. The constants for these are prefixed by IPPROTO_:
>>> protocols = get_constants('IPPROTO_')
>>> protocols
{0: 'IPPROTO_IP', 1: 'IPPROTO_ICMP',
...,
255: 'IPPROTO_RAW'}
The choice of which protocol to use for a socket is determined by the internet layer protocol you intend to use. TCP? UDP? ICMP? IGMP?
What is the default protocol used by our generic socket, foo?
These three properties of a socket correspond to the three positional arguments you may pass to the socket constructor.
Using them allows you to create sockets with specific communications profiles:
>>> bar = socket.socket(socket.AF_INET,
... socket.SOCK_DGRAM,
... socket.IPPROTO_UDP)
...
>>> bar
<socket._socketobject object at 0x1005b8b40>
When you are creating a socket to communicate with a remote service, the remote socket will have a specific communications profile.
The local socket you create must match that communications profile.
How can you determine the correct values to use? center
You ask.
The function socket.getaddrinfo provides information about available connections on a given host.
socket.getaddrinfo('127.0.0.1', 80)
This provides all you need to make a proper connection to a socket on a remote host. The value returned is a tuple of:
Now, ask your own machine what possible connections are available for ‘http’:
>>> socket.getaddrinfo(socket.gethostname(), 'http')
[(2, 2, 17, '', ('10.29.144.178', 80)),
...
(30, 2, 17, '', ('fe80::e2f8:47ff:fe21:af92%en1', 80, 0, 5)),
...
]
...
>>>
What answers do you get?
>>> get_address_info('crisewing.com', 'http')
[(2, 2, 17, '', ('108.168.213.86', 80)), (2, 1, 6, '', ('108.168.213.86', 80))]
>>>
Try a few other servers you know about.
Sockets communicate by sending a receiving messages.
Let’s test this by building a client socket and communicating with a server.
First, connect and send a message:
>>> streams = [info
... for info in socket.getaddrinfo('crisewing.com', 'http')
... if info[1] == socket.SOCK_STREAM]
>>> info = streams[0]
>>> cewing_socket = socket.socket(*info[:3])
>>> cewing_socket.connect(info[-1])
>>> msg = "GET / HTTP/1.1\r\n"
>>> msg += "Host: crisewing.com\r\n\r\n"
>>> cewing_socket.sendall(msg)
>>> cewing_socket.shutdown(socket.SHUT_WR)
Then, receive a reply, iterating until it is complete:
>>> buffsize = 4096
>>> response = ''
>>> done = False
>>> while not done:
... msg_part = cewing_socket.recv(buffsize)
... if len(msg_part) < buffsize:
... done = True
... cewing_socket.close()
... response += msg_part
...
>>> len(response)
19427
>>> cewing_socket.shutdown(socket.SHUT_RD)
>>> cewing_socket.close()
There are two basic methods on a socket for sending messages, send and sendall. We’re using the latter here.
With send, you send the message one chunk at a time. You are responsible for checking if a particular chunk succeeded or not, and you are also responsible for determining when the full transmission is done.
The recv method handles incoming messages in buffers.
Hotice that receiving a message is not a one-and-done kind of thing
We don’t know how big the incoming message is before we start receiving it.
As a result, we have to use the Accumulator pattern to gather incoming buffers of the message until there is no more to get.
The recv method will return a string less than buffsize if there isn’t any more to come.
Sockets do not have a concept of the “End Of Transmission”.
So what happens if the message coming in is an exact multiple of the buffsize?
There are a couple of strategies for dealing with this. One is to punt to the application level protocol and allow it to predetermine the size of the message to come. HTTP works this way
The other is to use the shutdown method of a socket to close that socket for reading, writing or both.
When you do so, a 0-byte message is sent to the partner socket, allowing it to know that you are finished.
For more information, read the Python Socket Programming How-To.
Tonight you’ll put this to work, first by walking through a basic client server interaction, then by building a basic echo server and client.