May 25th, 2011 - 20H00 @ CRI by Renaud Lifchitz
PyF is a python open source framework and platform dedicated to large data processing, mining, transforming, reporting and more.
What is an ETL? http://en.wikipedia.org/wiki/Extract,_transform,_load
PyF tube:
Producer code:
import feedparser, time def get_source(): d = feedparser.parse("http://rss.lemonde.fr/c/205/f/3050/index.rss") size = len(d['entries']) for i,entry in enumerate(d['entries']): progression_callback(float(i+1)/size*100) message_callback("[NEWS] %s" % entry.title) yield entry time.sleep(0.5)
Filter expression:
item.updated_parsed.tm_hour in range(12,14)