Web1.0のサイトから構造化されたデータを取り出す”Webスクレイピング”という技術があります。
過渡期の技術だと思いますが、当面の応用範囲が広くLWLと組み合わせると面白いです。
rubyには、scrAPIやhpricotが見つかりました。
前回、”テクノラティのランキングデータを取得”のために使ってみましたがhpricotの方が直感的で使い易いと思いました。
ランキングの更新を自動化するためにhpricotをKURO-BOX PROのDebian環境にインストールしましたが、あえなく失敗
とりあえず、バグレポートを送っておきました。
JRubyでトライしてみようかな。
インストールしてみるが失敗
$ sudo gem install hpricot Select which gem to install for your platform (arm-linux) 1. hpricot 0.6 (mswin32) 2. hpricot 0.6 (jruby) 3. hpricot 0.6 (ruby) 4. hpricot 0.5 (ruby) 5. hpricot 0.5 (mswin32) 6. Skip this gem 7. Cancel installation > 3 Building native extensions. This could take a while... ERROR: While executing gem ... (Gem::Installer::ExtensionBuildError) ERROR: Failed to build gem native extension. ruby extconf.rb install hpricot extconf.rb:1:in `require': no such file to load -- mkmf (LoadError) from extconf.rb:1 Gem files will remain installed in /var/lib/gems/1.8/gems/hpricot-0.6 for inspection. Results logged to /var/lib/gems/1.8/gems/hpricot-0.6/ext/hpricot_scan/gem_make.out $
開発用でリトライ
ここに開発バージョンのインストール方法が書いてあったので試してみました。
3と6が選択できましたが、どちらも失敗(何の違いもないみたいですね)
$ sudo gem install hpricot --source http://code.whytheluckystiff.net Bulk updating Gem source index for: http://code.whytheluckystiff.net Select which gem to install for your platform (arm-linux) 1. hpricot 0.6 (mswin32) 2. hpricot 0.6 (jruby) 3. hpricot 0.6 (ruby) 4. hpricot 0.6 (jruby) 5. hpricot 0.6 (mswin32) 6. hpricot 0.6 (ruby) 7. Skip this gem 8. Cancel installation > 3 Building native extensions. This could take a while... ERROR: While executing gem ... (Gem::Installer::ExtensionBuildError) ERROR: Failed to build gem native extension. ruby extconf.rb install hpricot --source http://code.whytheluckystiff.net extconf.rb:1:in `require': no such file to load -- mkmf (LoadError) from extconf.rb:1 Gem files will remain installed in /var/lib/gems/1.8/gems/hpricot-0.6 for inspection. Results logged to /var/lib/gems/1.8/gems/hpricot-0.6/ext/hpricot_scan/gem_make.out $ sudo gem install hpricot --source http://code.whytheluckystiff.net Select which gem to install for your platform (arm-linux) 1. hpricot 0.6 (mswin32) 2. hpricot 0.6 (jruby) 3. hpricot 0.6 (ruby) 4. hpricot 0.6 (jruby) 5. hpricot 0.6 (mswin32) 6. hpricot 0.6 (ruby) 7. Skip this gem 8. Cancel installation > 6 Building native extensions. This could take a while... ERROR: While executing gem ... (Gem::Installer::ExtensionBuildError) ERROR: Failed to build gem native extension. ruby extconf.rb install hpricot --source http://code.whytheluckystiff.net extconf.rb:1:in `require': no such file to load -- mkmf (LoadError) from extconf.rb:1 Gem files will remain installed in /var/lib/gems/1.8/gems/hpricot-0.6 for inspection. Results logged to /var/lib/gems/1.8/gems/hpricot-0.6/ext/hpricot_scan/gem_make.out $