Fabelier

a * Lab to make things

User Tools

Site Tools


workshops:pyf

Flow-based Python programming

May 25th, 2011 - 20H00 @ CRI by Renaud Lifchitz

Introduction to PyF

PyF is a python open source framework and platform dedicated to large data processing, mining, transforming, reporting and more.

What is an ETL? http://en.wikipedia.org/wiki/Extract,_transform,_load

First example: simple RSS to CSV converter

PyF tube:

Producer code:

import feedparser, time
 
def get_source():
  d = feedparser.parse("http://rss.lemonde.fr/c/205/f/3050/index.rss")
  size = len(d['entries'])
  for i,entry in enumerate(d['entries']):
    progression_callback(float(i+1)/size*100)
    message_callback("[NEWS] %s" % entry.title)
    yield entry
    time.sleep(0.5)

Filter expression:

item.updated_parsed.tm_hour in range(12,14)

Second example: Multi-page web scraper

PyF tube:

Individual item XPath:

/html[1]/body[1]/section[1]/div[1]/article/header[1]/h1[1]/a[2]

Other pages url XPath:

//a[starts-with(text(),"Suivant")]//@href

href computed attribute:

"http://linuxfr.org%s" % base_item.href[0]
workshops/pyf.txt · Last modified: 2015/02/18 22:38 by k4ngoo