********** Session 04 ********** .. figure:: /_static/granny_mashup.png :align: center :width: 70% Paul Downey http://www.flickr.com/photos/psd/492139935/ - CC-BY Scraping, APIs and Mashups ========================== Wherein we learn how to make order from the chaos of the wild internet. A Dilemma --------- The internet makes a vast quantity of data available. .. rst-class:: build .. container:: But not always in the form or combination you want. It would be nice to be able to combine data from different sources to create *meaning*. The Big Question ---------------- .. rst-class:: large centered But How? The Big Answer -------------- .. rst-class:: large centered Mashups Mashups ------- A mashup is:: a web page, or web application, that uses and combines data, presentation or functionality from two or more sources to create new services. -- wikipedia (http://en.wikipedia.org/wiki/Mashup_(web_application_hybrid)) Data Sources ------------ The key to mashups is the idea of data sources. .. rst-class:: build .. container:: These come in many flavors: .. rst-class:: build * Simple websites with data in HTML * Web services providing structured data * Web services providing tranformative service (geocoding) * Web services providing presentation (mapping) Web Scraping ============ .. rst-class:: left .. container:: It would be nice if all online data were available in well-structured formats. .. rst-class:: build .. container:: The reality is that much data is available only in HTML. Still we can get at it, with some effort. By scraping the data from the web pages. HTML ---- .. ifnotslides:: Ideally, it looks like this: .. code-block:: html
A nice clean paragraph
And another nice clean paragraph
.. nextslide:: HTML... IRL .. ifnotslides:: But in real life, it's more often like this: .. code-block:: html