*****************************************
Consuming Data from a RESTful Web Service
*****************************************

As an example of a RESTful web service, let's add some more information to our
list of apartment rentals from previous exercises.

We'll use a common, public API provided by Google.

.. class:: incremental center

**Geocoding**

Geocoding with Google APIs
==========================

https://developers.google.com/maps/documentation/geocoding

Open a python interpreter using your ``souptests`` virtualenv:

.. code-block:: bash

    [souptests]
    heffalump:souptests cewing$ python

Then, import the ``requests`` library and prepare to make an HTTP request to
the google geocoding service resource:

.. code-block:: pycon

    >>> import requests
    >>> import json
    >>> from pprint import pprint
    >>> url = 'http://maps.googleapis.com/maps/api/geocode/json'
    >>> addr = '511 Boren Ave. N, Seattle, 98109'
    >>> parameters = {'address': addr, 'sensor': 'false' }
    >>> resp = requests.get(url, params=parameters)
    >>> data = json.loads(resp.text)
    >>> if data['status'] == 'OK':
    ...     pprint(data)
    ...
    {u'results': [{u'address_components': [{u'long_name': u'511',
                                            u'short_name': u'511',
                                            u'types': [u'street_number']},
                                           {u'long_name': u'Boren Avenue North',
                                            u'short_name': u'Boren Ave N',
                                            u'types': [u'route']},
                                           {u'long_name': u'South Lake Union',
                                            u'short_name': u'SLU',
                                            u'types': [u'neighborhood',
                                                       u'political']},
                                           {u'long_name': u'Seattle',
                                            u'short_name': u'Seattle',
                                            u'types': [u'locality',
                                                       u'political']},
                                           {u'long_name': u'King County',
                                            u'short_name': u'King County',
                                            u'types': [u'administrative_area_level_2',
                                                       u'political']},
                                           {u'long_name': u'Washington',
                                            u'short_name': u'WA',
                                            u'types': [u'administrative_area_level_1',
                                                       u'political']},
                                           {u'long_name': u'United States',
                                            u'short_name': u'US',
                                            u'types': [u'country',
                                                       u'political']},
                                           {u'long_name': u'98109',
                                            u'short_name': u'98109',
                                            u'types': [u'postal_code']}],
                   u'formatted_address': u'511 Boren Avenue North, Seattle, WA 98109, USA',
                   u'geometry': {u'location': {u'lat': 47.6235481,
                                               u'lng': -122.336212},
                                 u'location_type': u'ROOFTOP',
                                 u'viewport': {u'northeast': {u'lat': 47.6248970802915,
                                                              u'lng': -122.3348630197085},
                                               u'southwest': {u'lat': 47.6221991197085,
                                                              u'lng': -122.3375609802915}}},
                   u'types': [u'street_address']}],
     u'status': u'OK'}
    >>>

You can also do the reverse, provide a location as latitude and longitude and
receive address informatin back:

.. code-block:: pycon

    >>> location = data['results'][0]['geometry']['location']
    >>> latlng="{lat},{lng}".format(**location)
    >>> parameters = {'latlng': latlng, 'sensor': 'false'}
    >>> resp = requests.get(url, params=paramters)
    >>> data = json.loads(resp.text)
    >>> if data['status'] == 'OK':
    ...     pprint(data)
    ...
    {u'results': [{u'address_components': [{u'long_name': u'511',
                                            u'short_name': u'511',
                                            u'types': [u'street_number']},
                                           {u'long_name': u'Boren Avenue North',
                                            u'short_name': u'Boren Ave N',
                                            u'types': [u'route']},
                                           {u'long_name': u'South Lake Union',
                                            u'short_name': u'SLU',
                                            u'types': [u'neighborhood',
                                                       u'political']},
                                           {u'long_name': u'Seattle',
                                            u'short_name': u'Seattle',
                                            u'types': [u'locality',
                                                       u'political']},
                                           {u'long_name': u'King County',
                                            u'short_name': u'King County',
                                            u'types': [u'administrative_area_level_2',
                                                       u'political']},
                                           {u'long_name': u'Washington',
                                            u'short_name': u'WA',
                                            u'types': [u'administrative_area_level_1',
                                                       u'political']},
                                           {u'long_name': u'United States',
                                            u'short_name': u'US',
                                            u'types': [u'country',
                                                       u'political']},
                                           {u'long_name': u'98109',
                                            u'short_name': u'98109',
                                            u'types': [u'postal_code']}],
                   u'formatted_address': u'511 Boren Avenue North, Seattle, WA 98109, USA',
                   u'geometry': {u'location': {u'lat': 47.6235481,
                                               u'lng': -122.336212},
                                 u'location_type': u'ROOFTOP',
                                 u'viewport': {u'northeast': {u'lat': 47.6248970802915,
                                                              u'lng': -122.3348630197085},
                                               u'southwest': {u'lat': 47.6221991197085,
                                                              u'lng': -122.3375609802915}}},
                   u'types': [u'street_address']},
                  ...
                  ],
     u'status': u'OK'}
    >>>

Notice that in the response there are actually a number of results.  These are
decreasingly specific designations for the location you provided.  The
``types`` values for each indicate the level of geographical specificity for
each result.


Mashup!
=======

Let's create a simple mashup of this data with the apartment listings we built
by scraping Craigslist in an earlier exercise.

Open your ``scraper.py`` file in your editor and add a new function. Call it
``add_address``. This function should:

* take a single listing from our craiglist work
* format the location data provided in that listing properly
* make a reverse geocoding lookup using the google api above
* add the best available address to the listing
* return the updated listing

Can you write this function without looking at the solution below?  Try it.

Solution
--------

Here are the changes I made to ``scraper.py`` to add this function:

.. code-block:: python

    # add an import
    import json

    # and a function
    def add_address(listing):
        api_url = 'http://maps.googleapis.com/maps/api/geocode/json'
        loc = listing['location']
        latlng_tmpl = "{data-latitude},{data-longitude}"
        parameters = {
            'sensor': 'false',
            'latlng': latlng_tmpl.format(**loc),
        }
        resp = requests.get(api_url, params=parameters)
        resp.raise_for_status() # <- this is a no-op if all is well
        data = json.loads(resp.text)
        if data['status'] == 'OK':
            best = data['results'][0]
            listing['address'] = best['formatted_address']
        else:
            listing['address'] = 'unavailable'
        return listing


You'll need to bolt the new function into your script so that the results it
gives are added to each listing.

Make the following changes to your ``__main__`` block:

.. code-block:: python

    if __name__ == '__main__':
        import pprint
        if len(sys.argv) > 1 and sys.argv[1] == 'test':
            html, encoding = read_search_results()
        else:
            html, encoding = fetch_search_results(
                minAsk=500, maxAsk=1000, bedrooms=2
            )
        doc = parse_source(html, encoding)    # above here is the same
        for listing in extract_listings(doc): # change everything below
            listing = add_address(listing)
            pprint(listing)

Give it a whirl, using the test approach so you don't hit Craigslist while
trying it out:

.. code-block:: bash

    [souptests]
    heffalump:souptests cewing$ python scraper.py test
    {'address': u'12339-12399 78th Avenue South, Seattle, WA 98178, USA',
     'description': u'2 BEDROOM 2 BATHROOM Zero Down   Rent with Option to Buy',
     'link': u'/oly/apa/4345117401.html',
     'location': {'data-latitude': u'47.4924143400595',
                  'data-longitude': u'-122.235904626445'},
     'price': u'$960',
     'size': u'3br'}
    {'address': ...
    ...
    [souptests]
    heffalump:souptests cewing$

Nifty, eh?

Reduce Your Footprint
=====================

At the moment, all of our results need to be held in memory at the same time.
In this case it probably isn't too big a deal, but it's good to practice being
kind to your resources.

Update the 'extract_listings' method to turn it into a generator.  Then we can
process a single apartment listing at a time, decreasing the memory
requirements of our script:

.. code-block:: python

    def extract_listings(parsed):
        location_attrs = {'data-latitude': True, 'data-longitude': True}
        listings = parsed.find_all('p', class_='row', attrs=location_attrs)
        # delete the line where you create a list in which to store
        # your listings
        for listing in listings:
            location = {key: listing.attrs.get(key, '') for key in location_attrs}
            link = listing.find('span', class_='pl').find('a')
            price_span = listing.find('span', class_='price')
            this_listing = {
                'location': location,
                'link': link.attrs['href'],
                'description': link.string.strip(),
                'price': price_span.string.strip(),
                'size': price_span.next_sibling.strip(' \n-/')
            }
            # delete the line where you append this result to a list
            yield this_listing  # This is the only change you need to make

When you make that change, each individual listing will be ``yielded`` from the
``extract_listings`` generator, and you will be able to add an address to each
without building all the rest first.

Going Further
=============

This would be a great opportunity for using asynchronous processing as well.

Can you think of a way to handle the adding of an address to each listing using
an asynchronous call using ``gevent`` or ``tornado``?

Consider this a standing challenge.