Relational Joins for Elasticsearch: the SIREn Join Plugin

Relational Joins for Elasticsearch: the SIREn Join Plugin

We are very very happy to announce today the general availability of SIREn Join, an open source plugin for Elasticsearch which allows ultra fast “Filter Joins” across Elasticsearch indexes.


It is often the case that one needs to filter something by the property of something else. E.g. “articles” which discuss “companies that have been funded not later than 3 months ago”, etc.

Sometimes, one can use things like Parent-Child in Elasticsearch. At other times (see scenarios below), this is not feasible and uglier/hard to maintain solutions are adopted, typically by materializing all sorts of data into the records to be able to search and filter.

With SIREn Join, you can now say no to this practice and keep things clean, pretty much as you would in a normalized RDBMS.

Actually cleaner, thanks to the fact that ES, unlike DBs, nicely supports multi-valued attributes (see below).

Currently, one type of join is implemented which we call “Filter Join”, but our objective is to add more types of joins in the future.

The Filter Join is basically a (left) semi-join between two sets of documents based on a common attribute, where the result only contains the attributes of one of the joined set of documents. This join is used to filter one document set based on a second document set, hence its name. It is equivalent to the EXISTS() operator in SQL.

The Filter Join supports joins on both numerical and textual fields, but the joined attributes must be of the same type. You can also freely combine and nest multiple Filter Joins using boolean operators (conjunction, disjunction, negation) to create complex query plans.

It is fully integrated with the Elasticsearch API and is compatible with distributed environments.

How Does It Compare With Parent-Child

The Filter Join is similar in nature to the Parent-Child feature of Elasticsearch: they each perform a join at query-time. However, there are important differences between them:

  • The parent document and all of its children must live on the same shard, which limits its scalability and flexibility. Filter Join removes this constraint: it allows for the joining of documents across shards and across indices.
  • Thanks to the data locality of the Parent-Child model, joins are faster and more scalable. The Filter Join on the contrary, needs to transfer data across the network to compute joins across shards, limiting its scalability – albeit in a different way –  and performance.

There is no “one size fits all” solution to this problem, and one must understand the requirements to choose the proper solution. As a basic rule, if the data model and data relationships are purely hierarchical (or can be mapped to a purely hierarchical model), then the Parent-Child model might be more appropriate. If, on the contrary, you need to query both directions of a data relationship (e.g. as articles can mention more than a company and visa-versa), then the Filter Join might be more appropriate.

Easy N to M Relationships via Multi-Valued Attributes

The most important requirement for the Filter Join is to have a common shared attribute between two indices. For example, let’s take a simple relational data model composed of two tables, “articles" and “companies", and of one junction table, “ArticlesMentionCompanies" to encode the many relationships between them.

This model can be mapped to just two Elasticsearch indices: Articles and Companies. Compared to a traditional database model where a junction table is necessary, this model is simplified by leveraging multi-valued fields; article documents simply need to have a multi-valued field, “mentions", with the unique identifiers of the companies mentioned in the article. In other words, the field “mentions" is a foreign key in the “articles" table that refers to the primary key of the “companies" table.

How Fast is it?

Very. On a regular laptop we easily get 1 to 5 million results from a fully-connected join between 2 indexes containing 1 to 5 million records each in 40 to 60 MS.

Leverage it using our Kibana version, Kibi

We have engineered Kibi, our Friendly Kibana Fork around the extra capabilities that the SIREn Join plugin gives. Kibi allows (graphically!) configuring the relationships between the Kibana indexes and allows one to apply cross index filters as easily as clicking on a check box that “links” 2 dashboards together.

If you haven’t seen Kibi, we highly recommend you take a look at some of the videos on our youtube channel ! 🙂

Kibi is available at

Taking SIREn Join it for a spin (Getting Started)

Let’s take it for a spin: install the SIREn Join plugin in Elasticsearch, load two collections of documents inter-connected by a common attribute, and execute a relational query across the two collections within the Elasticsearch environment.


This guide requires that you have downloaded and installed the Elasticsearch 1.7.x distribution on your computer. If you do not have an Elasticsearch distribution, you can run the following commands:

Installing the SIREn Join Plugin

Before starting Elasticsearch, you have to install the SIREn Join plugin. Assuming that you are in the Elasticsearch installation directory, you can run the following command:

In case you want to remove the plugin, you can run the following command:

Starting Elasticsearch

To launch Elasticsearch, run the following command:

In the output, you should see a line like the following, which indicates that the SIREn Join plugin is installed and running:

Loading Some Relational Data

We will use a simple synthetic dataset for the purpose of this demo. The dataset consists of two collections of documents: Articles and Companies. An article is connected to a company with the attribute “mentions". Articles will be loaded into the “articles" index and companies in the “companies" index. To load the dataset, run the following command:

Relational Querying of the Data

We will now show you how to execute a relational query across the two indices. For example, we would like to retrieve all the articles that mention companies whose name matches “orient". This relational query can be decomposed in two search queries: the first one to find all the companies whose name matches “orient", and a second query to filter out all articles that do not mention a company from the first result set. The SIREn Join plugin introduces a new Elasticsearch filter, named “filterjoin", that allows to define such a query plan and a new search API _coordinate_search that allows it to execute this query plan. Below is the command to run the relational query:

  1. The filterjoin definition
  2. The path on the target index (i.e., articles) containing the foreign keys
  3. The definition of the source index (i.e., companies)
  4. The path to the primary key of the “companies" index
  5. The query that will be used to filter out companies

The command should return the following response with two search hits:

You can also reverse the order of the join, and query for all the companies that are mentioned in articles whose title matches “nosql":

The command should return the following response with three search hits:

What about Elasticsearch 2.0 compatibility?

We’ll release the 2.0 compatible version shortly, stay tuned!! (subscribe here below :))

Renaud Delbru

No Thanks / Already Signed Up