Random Sample

Unlike many other platforms, using DISCO AI does not require you to first “train” the system. However, we do recommend a statistical sampling as a good first step in any review, especially when intending to leverage DISCO’s predictive analytics.  Even if you decide not to leverage predictive analytics, samples can be a helpful tool in any workflow, for several reasons. 

  • Taking a sample allows you to forecast the prevalence of responsive documents across your review population, along with privilege and issue coding.
  • It can also give you a “preview” of your data set, giving you insight into key players and issues in the review. 
  • If you choose to use predictive analytics to shorten your review, a sample adds defensibiility to that choice.

Taking a sample allows you to forecast the prevalence of responsive documents across your review population, along with privilege and issue coding. It can also give you a representative “preview” of your data set, giving you insight into key players and issues in the review. If you choose to use predictive analytics to shorten your review, a sample adds defensibility to that choice.

To start, you first need to select the degree of confidence and margin of error you want to get from your results.  We generally suggest aiming for a 95% degree of confidence with a 2% margin of error.  To calculate the precise number of documents you will need to review, you can use any statistical calculator, including one built into DISCO’s sampling feature

DISCO supplies you with the following search syntax that will quickly and easily gather a random sample of documents: sample({size}, {population}).  So, for example, if we had placed the documents slated for review into a folder called “Potentially Responsive,” the syntax to pull our statistical sample would be as follows: sample([size], folder(“Potentially Responsive”)). If, on the other hand, we wanted to simply pull the sample from the entire database, our syntax would be: sample([size], all). We recommend foldering your search results, to allow you to quickly return to your exact sample set for ease of future review. 

Once the review of your sample set is complete, you can make forecasts about your document population. In the above scenario, we selected a 95% confidence level, with a 2% margin of error. This means that if, during your sample review, you tagged 17% of the documents Responsive, you could expect, with 95% certainty, to find approximately 15-19% of your underlying review set to be Responsive.   For our example above, to achieve a 95% degree of confidence with a 2% margin of error, a random sample of 2,395 of the remaining 1 million documents would need to be reviewed. After reviewing the 2,395 random sample set of documents, the review manager would then have their target range of likely responsive documents in the 1 million document population.

For example, assuming one found 17% of the sampled documents as responsive, that would mean that one could anticipate that between 15–19% (or between 150,000 and 190,000) of the underlying population would be responsive. In fact, one can say that they are 95% certain of their range, which was the “confidence” level provided by the sample.

Targeted Review

With those numbers in mind, one can begin the review, using DISCO AI along with any one or more of the traditional methods. One suggestion is to begin by doing “obvious” or “precise” keyword searches or search strings, such as the fairly unique name of the project, product, or contract that is at issue in the litigation, or a linear review of the most critical dates or custodians, and sorting those search results using DISCO AI.

Document Review

Workflow Review Stages should be used for all review where you intend to do one or more of the following:

  • Make most powerful use of DISCO AI
  • Track your review, and forecast timing of review completion
  • Divide review into batches
  • Organize levels of review (First Pass, Second Pass, Privilege Review, etc.)

The configuration of your Review Stages will determine not only which documents are reviewed, but how they are ordered within batches and what coding decisions will can be made.

In our example, after the lawyer has exhausted targeted review methods, begin reviewing according to the DISCO AI predictions of responsive documents using a managed review combined with DISCO’s “just in time” batching.

DISCO uses unique “just in time” batching. Batches are generated at the moment a reviewer requests a new batch.  This reduces batch-administration time tremendously and makes it easy to always keep reviewers working on your highest-priority data. For example, you can quickly re-prioritize the order of custodians in your review without having to remove and re-generate a set of batches. New batches will reflect your updated criteria without causing any administrative disruptions.

Just-in-Time Batching is particularly helpful when conducting a predictive prioritized review.  If you prioritize AI-recommended tags for a certain issue, DISCO will continuously push highest-recommended documents to the front of your review. Furthermore, as DISCO predictions get stronger, each new batch checked out will contain documents with the highest possible AI scores for the remaining unreviewed documents.

Continuous Learning

DISCO’s predictions are powered by continuous learning, meaning that DISCO learns from each tag you apply and continues to learn throughout the course of your review. DISCO’s AI scores tag predictions from -100 to +100, depending on how likely/unlikely the system believes you are to apply a certain tag to a certain document. AI learns continuously from every tag applied; it also continuously recalculates its scores, ensuring that you are always working with the most current, most intelligent AI model.

DISCO’s AI model relies on “positive” and “negative” signals. A positive signal is sent when a tag is applied to one or more documents (including during bulk tagging). Negative signals are sent when the system detects the choice to NOT apply a tag. This is captured only in instances when another tag IS applied. (Example: A reviewer has Issue A, Issue B, and Issue C on their coding pane. The reviewer tags only Issue A. This sends a positive signal for Issue A and negative signals for Issue B and Issue C.) Unlike positive signals, negative signals are not captured in bulk-tagging operations. Because DISCO AI works most optimally when receiving both positive and negative signals, we do recommend calibrating your review, and your coding pane, with this dynamic in mind.

Validation Sample

As your review progresses, use information generated from the Statistical Sample along with DISCO’s tag predictions to help determine when the review is complete. For example, if the number of responsive documents that has been found is close to the number forecasted by the prevalence within your sample, or if DISCO no longer recommends any additional documents (per the predictive scoring), consider running a Validation Sample of the remaining unreviewed documents.

For the example, let’s assume for round numbers that to find the 155,000 responsive documents, one also found 115,000 non-responsive documents in the course of the review; thus leaving 730,000 documents that have not been reviewed at all.

For the Validation Sample, you may want a higher degree of confidence and a lower margin of error, since you may use this sample to defend your review.  An acceptable number might be 99% Confidence with a 2% Margin of Error.  In our example, this would require in this case a random sample of 4,137 of the 730,000 “population” of the unreviewed documents. Again, you can use DISCO’s sample search syntax and Review Stages to conduct your validation review.  Your findings will help you determine whether you need to continue the review, whether you can defensibly stop the review, or whether you want to complete the review using a more cost-effective team (since it is likely that most of the Responsive documents have been located).


Let's assume the lawyer found that approximately 1% of the validation sample was in fact responsive (that is, 41 documents in the sample were responsive). With those numbers in mind, question is what to do? Should the review continue? Can one defend a decision to stop reviewing?

Your Answer