VALA2014 Session 2 Balnaves - VALA - Libraries / Technology and the Future //

Complex harvesting for content from public sources and email

VALA2014 CONCURRENT SESSION 2: It’s All About the Data
Tuesday 4 February 2014, 12:00 – 12:30
Persistent URL: http://www.vala.org.au/vala2014-proceedings/vala2014-session-2-balnaves

Edmund Balnaves

Prosentient Systems, NSW

Please tag your comments, tweets, and blog posts about this session: #vala14 and #s6

VALA2014 Session 2 Balnaves Paper 156.54 KB

Download

VALA2014 Session 2 Balnaves Video 0.00 KB

Download

Abstract

This paper presents the results of a project for complex harvesting system from web and email sources integrated with open source platforms to improve discovery of information about or relevant to the organisation from public internet sources. The paper discusses methods of harvesting, drawing on a mix of RSS, Google API search and simple web parsing. The paper presents the results of automated metadata allocation and subsequent manual curation. The project highlights the need to use multiple web scanning techniques, so as to be sufficiently exhaustive to catch relevant references, but also sufficiently specific to avoid unduly large false positive candidates for selection.

This work is licensed under a Creative Commons Attribution-NonCommercial License.