a * Lab to make things

User Tools

Site Tools


Flow-based Python programming

May 25th, 2011 - 20H00 @ CRI by Renaud Lifchitz

Introduction to PyF

PyF is a python open source framework and platform dedicated to large data processing, mining, transforming, reporting and more.

What is an ETL? http://en.wikipedia.org/wiki/Extract,_transform,_load

First example: simple RSS to CSV converter

PyF tube:

Producer code:

import feedparser, time
def get_source():
  d = feedparser.parse("http://rss.lemonde.fr/c/205/f/3050/index.rss")
  size = len(d['entries'])
  for i,entry in enumerate(d['entries']):
    message_callback("[NEWS] %s" % entry.title)
    yield entry

Filter expression:

item.updated_parsed.tm_hour in range(12,14)

Second example: Multi-page web scraper

PyF tube:

Individual item XPath:


Other pages url XPath:


href computed attribute:

"http://linuxfr.org%s" % base_item.href[0]
workshops/pyf.txt · Last modified: 2015/02/18 22:38 by k4ngoo