Skip navigation


warning: Creating default object from empty value in /var/www/vhosts/ on line 33.

The Web of Data is built upon two simple ideas: First, to employ the RDF
data model to publish structured data on the Web. Second, to set explicit RDF links
between data items within different data sources. Background information about the Web of Data is found at the wiki pages of the W3C Linking Open Data community effort,
in the overview article Linked Data - The Story So Far
and in the tutorial on How to publish Linked Data on the Web.

The Silk Link Discovery Framework supports data publishers in accomplishing the
second task. Using the declarative Silk - Link Specification Language (Silk-LSL), developers can specify which types of RDF links should be discovered between data sources as well as which conditions data items must
fulfill in order to be interlinked. These link conditions may combine various similarity
metrics and can take the graph around a data item into account, which is addressed
using an RDF path language. Silk accesses the data sources that should be interlinked via the SPARQL protocol and can thus be used against local as well as remote SPARQL endpoints.

Silk is provided in three different variants which address different use cases:

  • Silk Single Machine is used to generate RDF links on a single machine. The datasets that should be interlinked can either reside on the same machine or on remote machines which are accessed via the SPARQL protocol. Silk Single Machine provides multithreading and caching. In addition, the performance is further enhanced using the MultiBlock blocking algorithm.
  • Silk MapReduce is used to generate RDF links between data sets using a cluster of multiple machines. Silk MapReduce is based on Hadoop and can for instance be run on Amazon Elastic MapReduce. Silk MapReduce enables Silk to scale out to very big datasets by distributing the link generation to multiple machines.
  • Silk Server can be used as an identity resolution component within applications that consume Linked Data from the Web. Silk Server provides an HTTP API for matching entities from an incoming stream of RDF data while keeping track of known entities. It can be used for instance together with a Linked Data crawler to populate a local duplicate-free cache with data from the Web.

All variants are based on the Silk Link Discovery Engine which offers the following features:

  • Flexible, declarative language for specifying linkage rules
  • Support of RDF link generation (owl:sameAs links as well as other types)
  • Employment in distributed environments (by accessing local and remote SPARQL endpoints)
  • Usable in situations where terms from different vocabularies are mixed and where no consistent RDFS or OWL schemata exist
  • Scalability and high performance through efficient data handling (speedup factor of 20 compared to Silk 0.2):
    • Reduction of network load by caching and reusing of SPARQL result sets
    • Multi-threaded computation of the data item comparisons (3 million comparisons per minute on a Core2 Duo)
    • Optional blocking of data items
Your rating: None

The Open Data Movement aims at making data freely available to everyone.
There are already various interesting open data sets available on the Web. Examples include Wikipedia, Wikibooks, Geonames, MusicBrainz, WordNet,
the DBLP bibliography and many more which are published under Creative Commons or Talis licenses.

The goal of the W3C SWEO Linking Open Data community project is to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting RDF links between data items from different data sources.

RDF links enable you to navigate from a data item within one data source to related data items within other sources using a Semantic Web browser. RDF links can also be followed by the crawlers of Semantic Web search engines, which may provide sophisticated search and query capabilities over crawled data. As query results are structured data and not just links to HTML pages, they can be used within other applications.

The figures below show the data sets that have been published and interlinked by the project so far. Collectively, the 295 data sets consist of over 31 billion RDF triples, which are interlinked by around 504 million RDF links (September 2011).

Your rating: None

This document provides statistics about the structure and content of the LOD cloud. It also analyzes the extend to which LOD data sources implement nine best practices that are either recommended W3C or have emerged within the LOD community.

All statistics within this document are based on the LOD data set catalog that is maintained on CKAN. This document contains a preliminary release of the statistics. If you spot any errors in the data describing the LOD data sets, it would be great if you would correct them directly on CKAN. For information on how to describe datasets on CKAN please refer to the Guidelines for Collecting Metadata on Linked Datasets in CKAN.

Your rating: None

Dydra is a powerful graph database in the cloud.

It's tuned to make the most of highly connected data, like social networks.

It's affordable, fast, and easy to use.
Learn more ยป

Your rating: None

Redland is a set of free software C libraries that
provide support for the Resource Description Framework (RDF).

The Redland library packages are:

These are mature RDF packages developed since 2000
used in several projects.
Each library has its own news, detailed release notes,
and reference documentation with examples.

Your rating: None