Posts Tagged “rss”

RSS极大的方便了我们及时跟踪页面的最新变化,可惜不是所有的地方都提供了RSS。Google Reader虽然提供了为没有RSS的页面生成RSS的功能,但是只能处理英文网页,对于中文或日语网页,与及阻止了Google爬虫的网页就无能为力了,例如:

Generated feed for “http://www.zju.edu.cn/”
from http://www.zju.edu.cn/ Google feed by Google
* Google was not able to access this page to check for updates. This page may be unavailable or have other restrictions that prevent Google from getting updates.

于是自己写了一个简单的脚本,自己为这些页面生成一个RSS。

#!/usr/bin/perl
# wafeed.pl

use AnyDBM_File;
use DBM_Filter;
use Encode qw(decode_utf8);
use LWP::Simple qw(get);
use XML::FeedPP;

$config = $ARGV[0] || 'config.pl';
require $config;

$time = time;
if (-e $rssfile) {
    $feed = new XML::FeedPP::RSS($rssfile, utf8_flag => 1);
} else {
    $feed = new XML::FeedPP::RSS;
}
$feed->title($title);
$feed->link($link);
$feed->pubDate($time);
$feed->description($description);

dbmopen(%history, $dmbfile, 0666);
(tied %history)->Filter_Push('utf8');
while (($key, $cfg) = each %config) {
    $value = get($cfg->{'link'});
    $value = $cfg->{'handler'}($value) if defined $cfg->{'hand'};
    $value = decode_utf8($value);
    if ($value !~ /^\s*$/ && $value ne $history{$key}) {
        $history{$key} = $value;
        $feed->add_item(
            title => $cfg->{'title'},
            link => $cfg->{'link'},
            pubDate => $time,
            description => $value);
    }
}
dbmclose %history;

$feed->sort_item();
$feed->limit_item($itemnum);
$feed->to_file($rssfile);

程序中

Comments 7 Comments »