+353 87 1272938 firstname.lastname@example.org
SIREn – short for Semantic Information Retrieval Engine – is a plugin for Lucene, Solr and Elasticsearch that significantly enhances the performance, scalability, and flexibility of these search engines when dealing with nested data.
SIREn encodes the structure of the documents in the index using a completely different model than Lucene. In doing so, it uses its own disk format, low level compression algorithm and query operator implementation to get much higher scalability and performance especially when dealing with complex nested/structured documents. A comparison between regular method (Blockjoin) and SIREn can be found in this blog post.
SIREn has been originally created by Renaud Delbru as part of his Ph.D. studies . SIREn’s original use case was Sindice.com (2007-2012), a research oriented web search engine targeted at the “Web of Data”: all the pages which contained machine readable annotations (RDF, RDFa, Microformats, OpenGraph, Schema.org, …).
Within Sindice, SIREn enabled rich queries to locate web pages based on both the text and the structure of the data on the page. At its peak, Sindice contained 700+ Million marked-up web pages, served in production by a cluster of 4 SIREn indexes .
Since 2013, SIREn has been commercially developed by SIREn Solutions (Originally under the name SindiceTech), a spin-off created to bring SIREn to its full potential.
 R. Delbru, S. Campinas, G. Tummarello. Searching Web Data: an Entity Retrieval and High-Performance Indexing Model. In Journal of Web Semantics, 2011.
 G. Tummarello, R. Delbru. End of Support for Sindice.com: History, Lessons learnt and Legacy. In SemanticWeb.com.
SIREn is Free and Open Source under an identical licensing model than MongoDB.
1) The SIREn Core and QParser modules are released under the GNU Affero General Public License, Version 3.0, which is a copyleft license. This means that if you extend or modify the source code of these modules, you have to contribute those modification back to the community.
2) The SIREn Solr and Elasticsearch modules are released under the Apache License Version 2.0, which is copyleft free.
In practice this means you can build client applications which uses the SIREn Solr or Elasticsearch plugin without having to contribute your client applications back to the community.
We understand some legal departments might raise some questions on this. In this case, please let us know and we can provide a signed letter asserting the above promise.
Last, if the above isn’t enough to satisfy your organization’s legal department (some will not approve GPL in any form), please contact us – commercial licenses are available including free evaluation licenses.
Free, best effort support is provided via email, and community.
For commercial grade support please contact us.
It is possible to use a Lucene feature called Blockjoin to index and search nested data with Solr or Elasticsearch. Blockjoin works well for simple structured documents and relatively simple use cases, but beyond that it will quickly show its limitations.
SIREn, on the other hand, has been completely designed from the ground up to allow high performance and high scalability search over highly complex and nested documents.
Please see our white paper in which we compare Blockjoin with SIREn on a collection of US Patent Office documents.
In SIREn, parent-child relationships of the nested elements are materialised and indexed into the same document. This is at the core of SIREn very high performance and scalability.
This means that changes in the nested part of the document will require reindexing of the full document. Reindexing is something that can be done at very high speed so in a lot of scenarios, this is acceptable. If you have however a scenario where you expect frequent updates of nested elements, contact us, and we will be happy to discuss hybrid approaches.
Yes, all the modules of SIREn are available from the public maven central repository, under the group id com.sindicetech.siren, including the source and test artifacts.