Enhancing Solr with SIREn for Resume Search

Enhancing Solr with SIREn for Resume Search

Creating an advanced resume search solution poses a range of interesting challenges. One of the things that makes it interesting is the fact that resumes are a rich mix of structured data (locations, past employers, dates) and unstructured free text (descriptions, roles etc).

SIREn is a specialized indexing system which enhances Solr with advanced capabilities for mixed free text and structured search.

In this document, we take a simple resume schema and demonstrate how SIREn can add value to core search capabilities as well as facilitate innovative interfaces to search and recommendation.

A simple record schema for a resume:

Enhanced Ranking

Out of the box, SIREn can improve the precision and ranking of results for full text searches by taking the document structure into consideration.

Consider the case where a recruiter is looking to find an engineer with experience in “Solr” and “Hadoop”. These terms could appear in the generic “desc” field for the person as well as in the “desc” field specific to each position.

Clearly, the engineer is more relevant if the terms appear together in the desc field of a position – indicating that he has worked on both the technologies simultaneously the count of positions in which the terms appear together in the desc field is higher – indicating that he has done this multiple times.

Given that positions are an array of nested objects within the parent document, this kind of ranking is not easily possible with Solr’s flattening approach. Siren offers advanced field boosting which can use the full structure of the document. This allows precise specifications of the above example search priorities.

Structured Search and Exploration

SIREn’s structured search capability enables users to more precisely specify what they are looking for. This can be done, for example, through a form-based “advanced search” interface as well as through a more exploratory faceted interface.

Example 1

Searching for candidates where terms appear simultaneously in a description field of the same “position” object (e.g. Java and Hadoop)

Example 2

Search for candidates who have worked in “Microsoft” as “Program Managers”. Traditional search, with the “flattening” approach would give results that include documents where the terms do not necessarily appear in the same position object. SIREn would indeed provide the correct results.

Example 3

One can imagine powering an interface where one sets a date (e.g. with a slider) and to immediately see company composition and organigram at that time.

With Siren’s structured search, this can be accomplished with a single query: search for people which have (had) a position in which the company name is matched and the start-end dates are lower and greater than the given date respectively.

Example 4

With node positional operators, one can request people who have been been in “commercial” roles after a “technical role” and an “academic role”.

Enabling advanced “Relational search”

SIREn’s support for deeply nested json objects enables the materialisation of the relations between the root entity and other related entities. This enables very efficient “join like” operations across multiple entities as part of a query. This in turn translates to a more sophisticated search and ranking.

Consider the company field of the position object. Instead of just having the name of the company, one could add more materialized data e.g. also the company’s domain, size, location etc. This would allow users to query for all persons who were “Program Managers” in companies in the “enterprise software” domain with a size of “500 to 1000”.

One can materialize relations upto considerable depth with minimal loss in query performance. An example of this in action is the “Relational Faceting” interfaces, described in the next section.

Relational Faceting Interfaces

While all the examples above serve to enhance search over collection of a single type entity,
SIREn’s capabilities can be used to enable users to interactively explore collections of multiple types of entities that are interconnected with each other. We call this Relational Faceting.

To get a better idea, please see a demo of this kind of a interface

In the case of resumes, one could for example imagine three core types of entities – persons, positions and companies. With relational faceting, a users could browse and facet over all these three types simultaneously .

For instance, a user can constrain role to “Program Manager” in the position collection and switch to the company collection to see all companies that have this role. The user could then constrain domain to “enterprise software” and switch to the person collection to see all the people who satisfy both these criteria ie all persons who have held the “Program Manager” position in “enterprise software” companies.

Recommendation Search

SIREn is ideally suited to power real time recommendations. “MoreLikeThis” searches for instance can leverage the full document structure. Even more so when enhanced “relational data” has been materialized in the documents.

Consider the case where a person has worked in “Microsoft” as “Program Manager” and then moved to “Google” to work as a “Product Manager”. The documents for these persons will have two position objects in the same sequence with matching role and company attributes. SIREn’s support for positional operators in arrays allows you to search for and leverage these similarities.

More generally, SIREn can be used to search for fully or partially matching subtrees within documents. Further, given that one can specify wildcards for keys within the tree, one can match against values with only partial matches for the keys themselves.

An incremental migration path

SIREn schemaless operations greatly simplifies application development: semi structured data in xml or json to be directly indexed without having to worry about flattening fields or query tricks to make up for loss in structure.

This is of course great to know when one starts the development of a new application, but how to incrementally enhance existing applications, e.g. written with only Solr or Elasticsearch in mind?

This is usually done by installing SIREn as plugin in the existing Solr or Elasticsearch and making so that the data is indexed at the same time both in the original fields as well as in structured form in the SIREn field(s).

This way, one can retain the existing application functionalities while augmenting them with new features that rely on structured SIREn queries. It is even possible to access Solr and SIREn fields in the same query.

Renaud Delbru

No Thanks / Already Signed Up