Click here for full text:
Ingestion Pipeline for RDF
Bhatia, Nipun; Seaborne, Andy
HPL-2007-110
Keyword(s): ingestion pipeline; validation of RDF; inferencing; large RDF datasets
Abstract: In this report we present the design and implementation of an ingestion pipeline for RDF Datasets. Our definition of ingestion subsumes: validation and inferencing. The design proposed performs these tasks without loading the data in- memory. There are several reasoners and Lint like validators available for RDF, but they require the data to be present in-memory. This makes them infeasible to be used for large data-sets (~10 Million triples). Our approach enables us to process large data-sets. The pipeline validates data-specific information constraints by making certain closed world assumptions and provides elementary inferencing support. We illustrate the system by processing large data sets (~10 Million triples) from the Lehigh University BenchMark. We highlight the errors the system is capable of handling by writing our own ontology for an educational institute and data with errors in it.
31 Pages
Back to Index
|