Search Term Validation in eDiscovery and the Science Behind It

Search term validation in eDiscovery is the process of assessing and confirming the effectiveness and defensibility of search terms used to retrieve relevant electronic documents during legal investigations or litigation. It is a critical step in the eDiscovery workflow to ensure the search process is accurate, reliable, and comprehensive. By validating search terms, teams are minimizing the risks of missing important documents, false negatives, and false positives, thereby improving the efficiency and defensibility of the eDiscovery process.

If you had to take biology and chemistry back in school, you learned about the scientific approach, forming a hypothesis, designing and validating your experiments, executing the experiments, and then analyzing the results. This is the foundation of search term validation, a truly systematic and evidence-based approach to how we reliably validate search terms.

In legal, we’re always looking for efficiencies, because the volume of data in ESI has grown exponentially but eDiscovery deadlines haven’t extended. It is devastating to get to the end of a review and realize you had a 30% responsive rate. That means you reviewed 70% of your dataset that was not even needed for the case! This ratio is not cost-effective for the client, and not an efficient use of anyone’s time. Having a defensible way to look at the data and knowing why it’s defensible is important.

There are several different key metrics that play a critical role in assessing the accuracy and defensibility of your results. These metrics provide an objective measure to evaluate the effectiveness of search terms, ensuring a truly comprehensive evaluation. There are three terms that are important to search term validation: precision, recall, and F1 score.

  • Precision is the percentage of correct categorization of relevancy of a term. This refers to the quality of your search.
  • Recall is the percentage of responsive documents in a document population that a search or review process finds. This refers to the comprehensiveness of your search.
  • An F1 score is how you evaluate both precision and recall together to quantify your results. This refers to the balance of precision and recall. The resulting score allows you to evaluate what you did and gives you the information you need to adjust precision or recall, depending on the needs of the case.

Whatever review tool you use should give you easy access to these three baseline metrics, besides giving you the total documents, unique counts, etc.

Practical Tips and Strategies for Search Term Validation in eDiscovery

  1. Have a comprehensive understanding of the case. Know legal requirements, case objectives, and data sources. Understand the context so you can select the appropriate terms so that when you see the output you know quality and comprehensiveness.
  2. Constantly monitor your reporting. Check on responsive rates, how many documents are reviewed in an hour, how many documents are being produced vs. being marked responsive. At the end of review, go back and evaluate the processes, look for areas to improve efficiencies and areas where you could have done things a bit different.
  3. Start with broad search terms and iterate. Even before you talk to your case team, work through some of those processes, test your initial searches, look at the results, and work to figure out how to refine the terms.
  4. Use statistical sampling. A sample set is the percentage of the document set that you run a search term report and evaluate the results. The results will help you decide if you need to revise your search terms or are receiving the results you expected.
  5. Rely on your experts to help evaluate the level of precision and recall to obtain an acceptable, accurate score. Timing and volume of data are major factors in document review and your experts will help you decide what processes to apply to get the best results within your case deadline.

Data visualization techniques—or presenting data in various formats – helps you gain insight into performance, identify patterns, and visualize the distribution of relevant documents across your sets. They can help you understand and interpret the outcome of search term validation. Your experts should be presenting data in a variety of formats that help you easily understand results and make decisions on how to proceed with future cases.

To learn more about search term validation, watch our recent webinar. To ask questions about search term validation in eDiscovery, contact us today.

Lindsey Hodge manages the e-discovery department for Hall Booth Smith, P.C., and contributed to this blog post.