The Haystack Conference is the premiere event for search relevance. It took place April 22-25th in Charlottesville, VA. A few members of the Bonsai team were in attendance to hear about the latest innovations in search relevance. If you’re looking to improve your search results but aren’t sure where to start, here are some takeaways from a couple of talks that will help you get started.
Bots are common on Snag’s platform. Bot behavior can contaminate results even if they are just 5-10 percent of training data.
Keyword stuffing in job descriptions is also common. Snag overcame this by using topical keywords.
Intent can be difficult to infer. If a user doesn’t click on a result from the page, are they actively skipping them, or was it passive neglect? To better train their learning models, Snag incorporates the last clicked result into their query logs.
Deduping job results in a Search Engine Results Page (SERP) improves search quality. Job seekers don’t want to see too many jobs from the same company. To dedupe their SERPs, Snag uses Locality-Sensitive Hashing to tag duplicates/near duplicates, and then tunes results accordingly.
Snag’s big measurement of success is “Yield.” Snag must be good at producing relevant results for users, AND benefit job posters. Yield is a measurement created by the Snag team that factors in both. Search has multiple stakeholders, and it’s important to bring in these stakeholders when defining what success looks like.
II. Many teams also use experts to gauge the quality of search results.
LexisNexis is a provider of legal, government, business, and high-tech information sources. Currently, they reference three petabytes of legal and news data with 65 billion documents, which is three times the size of Wikipedia. LexisNexis hires legal experts to evaluate their findings. Here is their processed laid out in their talk “Making the case for human judgment relevance testing” by Tara Diedrichsen and Tito Sierra.
First, they select raters based on their subject matter expertise, cost, and capacity. Each rater is given metrics to gauge their success.
Then, they select queries to measure. LexisNexis uses actual user queries as much as possible, but sometimes they test constructed queries.
The raters are then asked to rank the results for a query on a scale from 1-4, based on a document’s permissibility in court. LexisNexis also gathers verbatim comments for these results.
Scores are then generated. Like many search relevance teams, the LexisNexis team uses the industry standard Discounted Cumulative Gain to measure the quality of search results. This allows them to compare the current search results with the feedback from raters.
The results are then communicated to various stakeholders within LexisNexis, including Development, Product, and other teams within the organization.
These were just a few out of many great talks from Haystack. If you’re interested in watching the videos when they are posted, tune into the Haystack website.