Fabelier

a * Lab to make things

User Tools

Site Tools


workshops:pyf

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

workshops:pyf [2015/02/18 22:38] (current)
k4ngoo created
Line 1: Line 1:
 +====== Flow-based Python programming ======
 +
 +**May 25th, 2011 - 20H00 @ CRI by Renaud Lifchitz**
 +
 +===== Introduction to PyF =====
 +
 +PyF is a python open source framework and platform dedicated to large data processing, mining, transforming,​ reporting and more.
 +
 +What is an ETL? http://​en.wikipedia.org/​wiki/​Extract,​_transform,​_load
 +
 +  * Project page: http://​www.pyfproject.org/​
 +  * Installation:​ http://​www.pyfproject.org/​en/​getting-started
 +  * Configuration:​ http://​www.pyfproject.org/​en/​getting-started/​configuring
 +  * Architecture:​ http://​www.pyfproject.org/​en/​welcome/​components
 +  * Plugins: http://​www.pyfproject.org/​documentation/​contents/​plugins/​
 +
 +
 +===== First example: simple RSS to CSV converter =====
 +
 +PyF tube:
 +
 +{{:​workshops:​pyf-web-extractor-fabelier.png?​800|}}
 +
 +Producer code:
 +
 +<code python>
 +import feedparser, time
 +
 +def get_source():​
 +  d = feedparser.parse("​http://​rss.lemonde.fr/​c/​205/​f/​3050/​index.rss"​)
 +  size = len(d['​entries'​])
 +  for i,entry in enumerate(d['​entries'​]):​
 +    progression_callback(float(i+1)/​size*100)
 +    message_callback("​[NEWS] %s" % entry.title)
 +    yield entry
 +    time.sleep(0.5)
 +</​code>​
 +
 +Filter expression:
 +
 +<code python>
 +item.updated_parsed.tm_hour in range(12,​14)
 +</​code>​
 +
 +=== Second example: Multi-page web scraper ===
 +
 +PyF tube:
 +
 +{{:​workshops:​pyf-simple-rss-fabelier.png?​800|}}
 +
 +Individual item XPath:
 +
 +<​code>​
 +/​html[1]/​body[1]/​section[1]/​div[1]/​article/​header[1]/​h1[1]/​a[2]
 +</​code>​
 +
 +
 +Other pages url XPath:
 +
 +<​code>​
 +//​a[starts-with(text(),"​Suivant"​)]//​@href
 +</​code>​
 +
 +href computed attribute:
 +
 +<​code>​
 +"​http://​linuxfr.org%s"​ % base_item.href[0]
 +</​code>​
  
workshops/pyf.txt ยท Last modified: 2015/02/18 22:38 by k4ngoo