The Flaws of Using Search Methods in E-discovery


TASA ID: 1793


The document review industry has used search methods for various purposes. The use of search methods have been validated by using flawed validation methods. I could show the validation methods that have been used are flawed due to the interference of networked-based distributed review models on performance, reviewer qualification mismatches, using tag counts in validation methods, and the misuse of statistical methods. Statistical methods, which always involve using small probability theory to address low-frequency high-risk problems, are sufficient to make most search results invalid. The flaw in using statistical methods in litigation is similar to using the small probability theory to address risks in the aviation industry which would lead to hull losses.  

I conduct a brief analysis of two well known key search methods which have been used widely to generate document pools for human review. 

Current Search Methods

The search method is used to pull potentially responsive documents for human review. It may be used to rank review priorities for documents. The search methods have the following two kinds.

1. Key Search Method

In document review, this kind of search method is used to search documents and pull the documents containing at least one search key to form a document pool for human view. After a search is done, the documents with hits are then reviewed by document reviewers, and the documents that do not contain any search keys are rejected as junk. The validity of the search method is thus based upon the assumption that all truly relevant documents must contain one or more of the search keys or combination keys, any potential errors of rejecting relevant documents are harmless or the risk from an under-inclusive search is sufficiently small that litigants can accept.

A search method may also be used to capture certain word and phrase patterns, such as two words appearing at a certain distance in the searched text.

2. Prioritized Review Method

Law firms normally want to get the most relevant documents as soon as possible in the early stage of representation. If a search method is truly able to find most or all important documents, it would enable the law firms to make strategic advantages in the earliest stage. This can help the law firms develop litigation strategy for potential settlement negotiation or adjudication. 

In response to the need, the review industry has developed what is called prioritized review method. By using this method, a set of search keys are used to find potentially relevant documents from all potential documents, divide them into a large number of batches according to their priority rankings, and assigned the batches to document reviewers. Each of the found documents or each family of documents is assigned with a priority ranking number. Documents with the same or similar priority ranking are placed in the same batch or batches. Priority ranking may be based upon the appearance of certain keys, hit frequencies of certain keys, the total hits of some or all keys, or any kind of those combinations. More broadly, a priority ranking value may be a function of an appearance of keys, frequencies of one or more keys, frequencies of certain key combinations or patterns etc. Of course, a ranking value may be referred to as ranking index, ranking coefficient, ranking code, or anything. The search keys for culling documents are most probably formulated based upon brief findings in preliminary case assessment. 

The prioritized search method must be based upon the same assumption for the search method. This method is believed to help the law firms reduce the number of documents to be reviewed by humans. 

Search Methods Are Unable to Capture Following Subjects

Search keys cannot be formulated to reach the full scope of all responsive documents. This should be presumed to be the case, and the only difference is how well a particular search method and a particular search key set can perform. Some obvious problems are as following:

  1. A search method is unable to capture all responsive materials due to great expression diversity. The way of expressions for the same concepts can be so huge that it is unrealistic to expect that a few search keys can capture all documents. 
  2. A search method is unable to capture all responsive materials if the search method is unable to recognize words and phrases due to all kinds of problems in text.
  3. A search method cannot accurately recognize all responsive materials if documents are created in image files, especially, the scanned image files. 
  4. A search method cannot capture responsive documents which exist as isolated documents, which use words without antecedents or incorporate critical information by context or understanding.
  5. A search method cannot capture responsive documents, the character encoding of which is different from key's encoding. This problem is serious in foreign language documents which use different encoding schemes. 
  6. A search method cannot correctly detect materials in documents that are intended to preserve secrecy when the documents are created. In this case, even licensed attorneys cannot always understand the documents but at least may detect it.
  7. A search method cannot correctly capture all documents that are intended to be part of all documents for a special communication cycle.
  8. A search method cannot reliably capture all documents reflecting a complex and sophisticated scheme.
  9. A search method cannot reliably capture all documents in coded languages. I provide specific reasons why a search method is unreliable for those stated subjects.

1. Expression Diversity and Probability Bias

The number of ways of expression is presumed to be very large. This can be found in a simple writing test: a population of people is asked to write five hundred words for an incident. Although, certain keys are used in high frequencies, there are always some ways to avoid them. For examples:

“Car accident” may be referred to as crash, contact, event, incident, and even none of them. If information can be carried by contextual information, intended readers can understand it by many ways of expression.

“Legal advice” is often used in privileged documents. A person's intention for seeking legal advice can be expressed in a large number of alternative ways such as help, shed some light for me, ask X for an explanation, according to his words, based upon Jack's email, the instruction, his message, someone's voice mail, instant message, X's tweet, someone's teaching, someone's warning, and someone's document. This is only a small part of all possibilities. The only way to get all such documents is to actually review them. 

Payment of money is the most critical element in any case where money is improperly paid. There must be some kinds of evidence for showing the payment of money. It can be expressed as wire, transfer, mail check, the fee, an amount, and the number (without a unit); there are all kinds of implied ways to mean payment of money.  

If an author expresses a concept by using only the most common words, the number of ways of expression is limited (most likely, several to tens). What complicates the matter is an unknown number of vague expressions, secondary expressions, and implied meanings. Such expressions have very strong cultural characteristics. In a negotiation for the amount of bribe, one litigant offers 30,000 (no unit) to a witness, who previously provided a false testimony, for providing a truthful testimony. The witness counters with “six six da shun” (meaning six and six is the most smooth or auspicious). The witness says that six is the better number for overturning his words; any reviewer can understand that the witness wants to double the amount of bribe.

In another scenario, one person writes a vague letter to a company's employees which attacks a government official who has been convicted of accepting bribe. However, there has been well known news that the government official has accepted a bribe from the same company. Thus, the letter actually attaches by implicating certain key employees of the company. Those kinds of letters are often seen in some cultures. This letter is clearly a complaint about the company’s leadership for taking bribes. This is not only one way to make a complaint. One can express anything by citing a relevant historical story, a famous person, an event, a scandal, a time, or a political age to mean something about a person, a thing, an event, a time, or a political system. Given the amount of useful information in Eastern culture, the number of potential expressions is gigantic. It is impossible to formulate keys to encompass this type of materials. Comparing a government official with a well known bribery taker in an ancient dynasty may mean that the government official has accepted a bribe. 

A search method favors getting long documents. Any concepts may be expressed by several to hundreds of ways of expression. If a document comprises a plurality of information units (which may be a noun, subject, object, action, or certain property), each of which can be expressed in different ways, search words are randomly drawn from the entire vocabulary, the chance of capturing the document will increase with the number of units. The probability of getting this document is the probability of having at least one word (or pattern) in the selected N keys (or patterns). Thus, if a document has two words that are treated as randomly selected from a vocabulary space, the chance of getting one or both words that are search keys would be extremely small. When the document contains a large number of words, which can appear at any frequencies, the chance of getting the document by using any arbitrarily selected words are much higher.  When a document is very long, it may be drawn in a near unit. While search keys are not selected randomly, they may be treated as random keys as far as certain non-responsive materials are concerned. Thus, the search method always gets large documents that have nothing to do with the client and legal issues. Anyone can verify this finding by analyzing a search result at review sites. 

Search keys; however, are not selected randomly. They are formulated so that they will get relevant documents in much higher probabilities. For those documents containing those keys, the search method will get them as long as one or more search keys are used in the documents. In other words, search keys are formulated in favor of capturing responsive documents. The search method with proper search keys tends to capture a high portion of relevant documents.

As argued above, many information units can be expressed in more than one way due to expression diversity and alternative expressions that cannot be foreseen by those who formulate search keys. Therefore, the intended probability favoring capturing relevant materials loses its force. Probabilities can still be used to estimate the overall chance or tendency to capture documents. Assuming that a document contains only five information units, each of the information units can be expressed in 30 ways, there is no correlation between any two information units so that they are independent of each other, and any single hit in any information unit will get the document, the overall chance would be some type of “sum” of all individual probabilities for getting the information units. Obviously, the chance of capturing this document will increase with the number of information units. The probability will be nearly a unit for all very long documents.

Due to the nature of some sort of cumulative probability, a long document containing thousands of information units (or words) can be captured in higher probabilities. In contrast, it would be much less likely to capture short documents containing only a few words like “I agree,” “You get it,”  “OK with me,”  “Do it on your risk,”  “will sign on,” “go ahead,” “No problem,” and “Love the deal”...  For a short document like this, none of common words can be used as a search key. If a large number of common words like this is used, a search would get nearly all documents that are to be excluded. In other words, using a large number of popular common words will defeat the purpose of using the search method. The above analysis shows that the search method cannot reliably capture short documents by finding common words in their texts.

Relevant matter in a document may comprise only several information units such as “X provides legal advice to Y,” “X makes payment to Z,” and a technical report discussing certain technologies. If a set of search keys contains a large number of keys, the search method would improve the chance to capture them. However, the inclusion of those large numbers of common words would get practically all documents, and thus make the search method useless. One potential solution to this problem is to use a pattern search, but it would be very hard to find all of the possible ways of expression patterns.

Improved chance to catch long documents cannot be used to prove the validity of the search method. When a document is long, it can be viewed as a large number of expressions, each of which comprises several information units. A higher probability is achieved due to appearance of many common words. For example, “legal” is used in all kinds of other contexts having nothing to do with legal advice: it is used in (1) all email disclaimers, (2) all software licensed agreements, (3) all other signed agreements, (4) all news and reports concerning legal issues, (5) customer complaints containing legal matters, (6) stock analysis reports, and (7) all reports regarding legal issues. “Legal” is a common word which is used in many documents, but not all documents. Popular common words may appear in the documents for unrelated purposes. A search using a large number of common search keys (k1, k2... Ki) can capture a large number of long documents, thereby defeating the search purpose.

Based upon the above comparative analysis, one can immediately see that the search method has a probability bias in favor of getting long documents and tends to discard short documents, especially, the important short email that could not be captured by authors' identities. Unfortunately, all important documents concerning business transactions are expressed in short, informal, and vague emails, sometimes with the sender's identity hidden. Thus, search methods should not be used in the cases where transaction decisions are important. 

When a document is created in a foreign language, expression diversity may appear as much serious problem due to different words and phrases and different sentence structures, but the probability pattern is still applicable. The search method is incapable of getting short documents that are written in common words and/or documents, critical meanings of which are carried by context or using a related concept. 

2. Words/characters Recognition Problems

The search method may be unable to identify materials when any of the following is true: (1) illegible handwriting (even hard for human), (2) writing text has irregular spaces, (3) keywords in text corresponding to are misspelled, (4) relevant text contains invisible characters such as control characters, special characters, and format characters, and (5) search keys and critical works in text use different encoding schemes. For this class of documents, the design of the search algorithm and the quality of the optical character recognizer (“OCR”) can directly affect search result. The impact can be unpredictable: it may ruin one kinds of documents, all documents created by one author, all documents processed by a vendor. 

When original documents are handwriting notes, they are scanned and converted into texts by OCR. The converted text is generally very poor and some of them may become junk. A meaningful search cannot be conducted. 

3. Documents in Images

Documents may be created in an image format such as PDF, TIFF, and other graphical format. A PDF file may be generated from a native file or a scanned file. Native PDF files can be searched accurately, but the scanned PDF files often contain poor, broken or garbed texts after they are converted into text files. Generally, meaningful search cannot be performed on texts originated from paper documents. However, search performance may be improved with the progressing of technologies.

For other graphic files, the accuracy of a search would depend upon file formats, how files are created, and how texts are searched.

4. Lack of Antecedents and Contextual Deficiency

A very common problem is passing information by context or prior understanding. When critical information is passed as contextual information, the author does not need to use any of the common words to express it. A statement: let go, OK, you get it (and virtually anything)..... can be used as proving a transaction that the author has been asked before. When contextual information does not appear in the current email or document, there is no good method for capturing this type of documents.

Unspecified antecedents are very common problem in most documents. This problem has a lot to do with the purpose of the corporate documents. Documents are created to get business done, by a group of employees who know everything concerning the business. The writing is informal most times. There is no need to state background, purposes, current standing, related activities, involved staff, external facts, and relevant business, company history, and legal environment, unless it is necessary. They write documents to address the issues they want to address. A vast number of documents standalone is out of context. The email may be one sentence “Have we filed appeal with the court?” “we visited their headquarter to see the facilities.” “The board has approved the deal.” It is common even for long documents. For example, a very long table contains several columns of numbers without any explanations; long reports have all kinds of analysis data but do not indicate their uses; a long email does not reveal why they discuss the subject, whom they represent, and how the subject is related to other subjects (all information might be in other separate email). There are all kinds of scenarios for documents to become undecipherable due to lack of a large number of antecedents. Document reviewers can recognize all words and phrases, but cannot understand the precise meaning in some aspects important to their discovery decisions.

Documents often reveal their true meanings only when they are read in light of other related documents. Most of search words cannot capture relevant documents by using search methods. 

5. Character Encoding Problems

Documents may be created by using text editors and various word processors. Two common practices are copying text from a document to document, and using a document from any source as a start template to save time. Each of the letters or characters in the files is represented by a binary value. Text files also contain special characters, control characters and sometimes format characters. When those files are saved, read, copied, and transmitted, those binary values should be preserved. However, some programs may change their binary values. If a change affects formats, they may affect search if a search algorithm uses format in its match method. 

Serious problems can be seen in documents in foreign languages. Many foreign languages may be expressed by different encoding schemes. A letter or character may be expressed in different length with different binary values. When a program reads, opens, displays, and saves a file, it must recognize the encoding scheme in the file before the program can process it correctly. Not every program is capable of identifying all used encoding schemes in the file and processing it correctly. If the text in a file is created according to one encoding scheme, but a program processes the file using a different encoding scheme, it would alter the file. If the altered file is saved, the program permanently damages the file. That is why many files contain wrong, strange, or garbed characters.

Writing several languages in one file and using mixed encoding schemes in a file in the same language can cause additional problems. A reading, writing, and rendering program may be unable to recognize several encoding schemes for one or multiple language texts within the same document. After such a file is opened and saved, some foreign languages may become garbed. They may be altered, even though they appear to be the same.

When a document contains text in several languages, the text may be  created by copying text from other sources or old documents. Thus, the text or some part of it may use different encoding schemes. The copied text may be not in compliance with the unicode encoding standard, or the used encoding scheme, and the document thus contains mixed encoding schemes. One possible way of altering encoded text is when a program, which does not support multiple encoding schemes, is used to process the document. The program may alter the portion of text encoded by a scheme that the program dose not support. When the document is saved, this part of text is forever altered. Other programs will not be able to process the altered text after the document is saved. This explains the common phenomenon that some documents contain garbage in part of the text, text one or more languages, or in some text within the document.

Although the unicode scheme is intended to eliminate these problems, many old text files that do not comply with the unicode standard still exist and they may be used as sources for copying. In addition, a large number of old document templates and model documents are still used as start documents to save time. A large number of legacy programs including word processors, email clients, text edits, and web browsers are still in use. They are unable to correctly determine Unicode and other encoding schemes. The differences among different coding schemes can be very small. Sometimes, a change to an encoding scheme may affect only certain characters. It can be very strange. In some cases, only one in every ten characters is altered. In other cases, only certain letters or characters are altered without any apparent sign. In the worst case, the program may alter only invisible format characters.  

A search algorithm will fail to find intended words and phrases if (1) the text encoding scheme is different from the encoding scheme of the search key, (2) the encoding scheme of the text has been changed, (3) the search algorithm supports only certain encoding schemes while the text contains an encoding scheme which is not in compliance with unicode standard, (4) certain binary values have been altered, making match impossible. For those reasons, various obvious words in text cannot be “found.” 

The search algorithm of any search method may affect how well the search method can handle all character-matching problems. At this point, the search technologies are unsatisfactory. Too little effort has been done to understand the encoding problems that happen in a large number of documents. Little effort has been made to restore correct encoding in the cases where it is possible. In the worst cases, the text in a majority of documents are garbled. In all kinds of situations, it is hard to argue that search can achieve a valid result.

6. Documents Intended to Keep Secret

Any intended meaning can be expressed by implication, connotation, analog, and uncommon expressions, while transaction may be referenced by project name, project code, transaction nickname, and all kinds of other unfamiliar words. 

When a discussion subject is sensitive or is highly private or confidential, the author may express it in a vague way so that it is hard for others to understand. Any attempt to keep subject secret can defeat the search method because it would be impossible to know what kinds of search keys can be used to capture such expressions.

When an investigation is made to look at intentional wrong, search methods should not be used. 

7. Partial Communication

Documents may be written as an incomplete or partial communication. Such documents are only part of the “documents” for a two- or more-way communication cycle. The first part communication is normally a phone call, a personal interview, a personal or group meeting, and other separate communication (e.g., video conference, faxed document, and separate email). The second part of the communication is a response to the first part of communication. Since the first part of communication is unknown, the second part of communication is hard to understand. It may be a write-up requested in a prior interview, a response to a question posed in a meeting, or a report that has been scheduled long time ago. 

Partial communication is out of context to document reviewers. Sometimes, a few things can be inferred from such a document; sometimes, it is impossible to make any inference. In all cases, it is hard to get all required information for making reliable discovery decisions. Search method may totally screw up the discovery.

8. Complex and Sophisticated Schemes

A scenario is a perpetual money-transporting cycle. An Indian company retained a consulting firm in India for bridging it with government officials in order to get government contracts. The company never directly gave money to the consulting firm for improper purposes. The company and the consulting firm also had a joint venture for doing other business unrelated to the company's business. In that joint venture, the company always contributed more property and got less cash for its contribution. This actually resulted in a net cash flow from the company to the consulting firm. All transactions between the company and the consulting firm were lawful, and all commission payments were reasonable. However, a careful review of this transaction shows that the company gave a large amount of cash to the consulting firm by the abnormal distribution of cash in the shared venture. It is also found that the consulting firm spent 10 millions to bribe government officials. Now, a U. S. company conducts a due diligence review in connection with a proposed acquisition of the Indian company.  In this sophisticated cash transporting system: the company gave net cash by the joint venture to the consulting firm, the consulting firm bribed the government officials, the government officials got bribe, and official gave business to the company. It is highly unlikely that the search method with any search keys could capture all required documents unless the key designers know this perfect money-transporting scheme and are familiar with all of those documents. 

When an improper objective is achieved by using a sophisticated arrangement, the search method cannot be used to reliably capture relevant documents. In another example involving payment of money, a company may be rewarded with certain stock for donating money to other organizations in other countries. The company may not regard such donated stock as part of its business asset and may not even enter the stock values into the company book. However, the employees of the company can “sell” the stock at a nominal price to a middleman who then uses his own money to bribe government officials. It is hard to design keys to get those documents.

There are still other file formats that prevent a search method from finding responsive documents. They include encrypted files, executable applications reflecting relevant issues, audio files, and video files. 

Specific Examples Showing Search Method's Limitations

The following examples show why certain search keys cannot retrieve relevant and critical documents and thus cast doubt on the validity of the search method. When the search method is used, document reviewers cannot see excluded documents. Thus, it is impossible to determine specific misses. The only way to assess the true effects of the search method is to conduct a comparative study of (1) reviewing all documents from all potential customers and identify all responsive documents (using an improved review model), and (2) determining if all of the true relevant documents are within the captured documents. However, it is impractical to do such a study due to increased costs. Therefore, indirect evidence is used to show potential problems. The following problems can be observed at review sites.

1. A large number of documents are caught because they contain one or more key player names. Whenever, key player names are unclear, referred to by implication, with different spellings, those documents cannot be captured by their identities. This is specially a problem for short documents containing only several common words. Popular common words generally cannot be used as search keys because they lack special meanings and appear in large numbers in documents.

2. One of the most important things is the identity of persons. Some documents do not disclose true identities: a draft document may contain no author name; documents may use private or unfamiliar email addresses,  other person's email address, or other address without disclosing anything. Some documents show external email addresses without fully disclosing the persons names. If those documents are caught, they are caught by luck rather than by intended keys. This is specially a problem for very short documents comprising only several common words and documents that appear to be remote from litigation focus. For those documents, additional keys and search key pattern cannot capture them.

3. Some documents refer to specific subjects such as relevant products, bribe, and infringing products by alternative, vague, or confusing words. Those documents may be rejected as junk if they cannot be captured by other keys by luck. The reasons for this failure include that the subject is not expressed in normal language, but in other less common names and any of a large number of alternative expressions such as contract parties (our deal, Blue Star deal), transaction date (11-3 agreement), contract name, product code, serial number, price dispute, control number, tracking number..... It is impossible to foresee all potential words. 

4. Documents are intended to evade key word search. Since the search method has been known for years, search keys can be used only for documents which are concerned “normal business products.” If a document review is intended to identify documents concerning normal transactions, the use of the search method may be justified on the basis of saving costs. However, documents may reflect a range of conduct and activities from normal business transactions, unethical dealings, technical statutory violations, statutory violations, serious crimes (with jail time), bribery, and murder. It would be very hard to argue that, in all those scenarios, documents are created in the same degree if candor and would reveal all details. It is reasonable to argue that the search method would have a diminished chance to capture critical documents concerning abnormal conduct.

5. In cases involving corporate looting, embezzlement, stealing, conflict dealings, and bribery, the most important evidence is payment of money and payment purpose. However, those documents look very innocent. The payment entries on all kind of documents may not reflect payment purpose and payees. Some documents may be lost because they do not use the most common words such as cost, payment, amount, dollar.... There are all kinds of other ways of expressing payment without using those words. A payment transaction in email may be referenced by dated deal, contract, translation, check number, wiring number, product name, informal party name, bank account number, and bank name....The intended readers can understand largely by business context. It is impossible to capture all of those documents.

6. Documents concerning intention are often the most difficult to capture because the potential number of expression is beyond imagination. Those documents, which are often critically important, have two to three elements:  persons or parties, intention and relevant subject. Personal identities may be unreliable. Intention may include consent, agree, disagree, like, dislike, OK, get, approval, yes, no.... Some of them are common words that are used everywhere and virtually any intention can be expressed by popular common words.  Thus, those words or a few obvious combination cannot be used to reliably capture those documents. The last element is the subject which can also be expressed in special words, implied terms, related terms, or just by business context. Context may be established by personal meeting, prior phone calls, a separate email, a meeting decision, and previously established meeting minutes, meeting schedules, or simply working on the underlying transaction. For all of those reasons, documents concerning personal intention cannot be captured reliably. 

7. At review sites, a common pattern is that one search word is person name, Football, a responsible document is caught because it also mentions a sport event, Football. This document was found in the search pool by coincidence: the person's name happens to be the same as a the sport name. If the person has a different name such as John Doe, this document and all related documents would have been rejected as junk. This is an example to show a common problem. In other words, some responsive documents are captured by luck rather than by the intended likelihood of  relevant search keys. In another example, some responsive documents are caught due to one or more hits of “legal” in a legal disclaimer. In this and similar documents, the chance of getting these documents would depend upon the chance of using a legal disclaimer. Those documents would be rejected as junk if they do not contain a legal disclaimer. There is no special relationship between responsive subjects and legal disclaimers. 

Risks from Using Search Methods

All search methods and all prioritized methods do not make a final coding decisions like computer algorithms, and thus cannot directly create risks.  

Both of the methods pose some risks through disrupting review context and affecting review performance of licensed attorneys. This is how the methods increase risks. Under the current review model, licensed attorneys are not enabled to deliver potentially highest performance, but still have some chance to catch some critical documents if they are definite on face. Some risks of using search methods include that (1) failure to capture a document revealing an attorney's affiliation may prevent reviewers from correctly identifying some privileged documents, (2) failure to capture some documents disclosing prior litigation work product may prevent reviewers from identifying some work product, (3) failure to capture certain documents in a complex transaction may prevent reviewers from detecting unrelated liabilities; (4) failure to capture critical documents for a complex scheme may prevent reviewers from detecting the whole scheme; and (5) disrupting document context may prevent reviewers from identifying other risks, thereby the law firm in a bad position to protect the client's interest. 

If document review context is seriously disrupted due to prioritization of documents  or removal of helpful documents from the review pool, the reviewers will be in a worse situation to detect many kinds of issues critical to litigation. The law firm may be unaware of the client's scandals, similar conduct, poor credibility, malice and intention, business risks, and litigation risks. The adversary party, who might know those issues from other sources or prior dealing, could surprise the court when the law firm has neither resource nor time to address it. It is unrealistic to tackle millions of documents in a short notice. In litigation, a big surprise can change the parties' litigation postures instantly. 

Risks may be divided into four types: (1) risks that have some impact on the current case only, (2) risks that can decisively defeat the client's case, (3) risks that can negatively affect the client's competitive position, and (4) risks that can increase the probability of exposing additional lawsuits against the client. Some risks are small to moderate, but others are equivalent to the hull-loss risk in the aviation industry. If a materialized risk can instantly defeat the client's case, cripple its business or significantly diminish its competitive position (e.g., help several new competitors unseat the client's dominant position), and assist other persons in initiating a chain of lawsuits (e.g., private actions, regulatory actions, consumer class actions, and shareholders actions), this risk is equivalent to the hull-loss risk in aviation because they threaten the client's existence. 

Whether risks are of a hull-loss type or of limited nature would depend on the nature of the case and the risk itself. In the above example concerning complex and sophisticated schemes, using a search method may cause document reviewers to miss the money-transporting scheme. This risk is considered as a “hull loss” type to a corporate client because the company would “buy” a criminal prosecution with a chance of exposing additional civil and criminal liabilities from its own documents. It cannot be assumed that a mass document production will not implicate export violation, antitrust, patent infringement, trade secret misappropriation, securities fraud.... By disrupting review context, the search methods can interfere with reviewer's ability to identify other critical issues, business secrets, and unrelated risks. Some exposed risks may be materialized many years later, and some may hurt the client for years without the client's awareness. Only those who take advantage of the information know it. One must assume that a massive document production contains everything that could be used against the client.

Search methods and many other technologies can indirectly hurt the client through operation of what I call a “skilled person in the subject.”  Attorneys and document reviewers who is not a skilled person in the subject need full context information to conduct full analysis in order to fully understand the subject and risks. However, other persons skilled in the subject need only pieces of information to be combined with other information to fully understand the subject. This is very similar to the patent law's “skilled person in the art” concept which has been used to determine invention's merit for centuries. Both search methods can affect risk control in two different ways. Disrupting review context by search methods can cause document reviewers to fail to identify risk subjects, but will not stop them from producing risk-carrying materials. This is primarily due to the current non-aggressive discovery practice by which all non-responsive materials in non-privileged responsive documents are produced to the adversary. This practice is in sharp contrast to the conventional discovery standard of producing only responsive materials. 

From a huge production containing a millions of pieces of non-responsive information units, persons skilled in the subject can pick up partial information to be combined with other information they know to be used against the client. Competitors might have context information from other sources to assist them in figuring out the client's business plans, pipeline products, and trade secrets. The leaked information  may place competitors in a position to design around the client's business plans and easily take advantage of the client. There is no way for the client to effectively compete with competitors. By using a similar approach, a potential adversary can use pieces of information to be combined with other information they have as support of new lawsuit. It is a mistaken belief that a disrupted review context and/or incompetent reviewers will favor withholding their business information and risky materials. They do, but they do it by putting far more in the responsive pile than in the non-responsive pile, especially, when they have to code documents as responsive whenever they could not understand. 

In this highly competitive world, competitors and the adversary actively seek information form all sources. Considering the frequent leakage of government's information, intelligence information, and military information, and how businesses collect intelligence information, it must be presumed that producing a huge amount of non-responsive information that no body even pay attention to is a high risk. It cannot be assumed that the massive amount of non-responsive information in hundreds of thousands of documents is harmless. Coding documents by guessing should be viewed as a dangerous game if the case is viewed as a game against a whole world.

If a collaborative review model is used to improve review accuracy, one critical step is conducting a reconciling review for all documents that have been coded tentatively (where at least one critical fact is assumed and the significance can change). The validity of that step depends upon the effectiveness of the search method. However, as demonstrated above, search will not able to find troublesome documents. This analysis underscores the need for mapping out all unsearchable documents that appear to be critical or important. This can be done by entering a record in a database table for tracking each of suspected documents. The record must contain critical key, relevancy information and/or risk reason, and document location. In conducting reconciling review, a series of searches will be conducted against the tracking table in addition to  searches conducted against documents. If all critical documents are available for review, the limitation of the search method will not affect review product in a predictable way. However, if the search method has failed to capture some critical documents, human review cannot undo this effect. Search method could change case disposition.


Search methods and prioritized search methods together with proper search keys are useful in cases where litigants will not suffer from hull loss risks. The methods generally are incapable of reliably capturing all responsive documents important to litigation outcome. They are incapable of reliably capturing short documents, troublesome documents, and the documents intended to keep secret. Both methods can negatively affect performance of document reviewers primarily through disrupting documents context. They may indirectly increase risk of exposure through over inclusive production of non-responsive documents, which could be used by skilled person in the subject. 

Neither of the methods should be used in criminal cases and high-stake civil cases. Litigants on both sides of a civil case should be fully informed of potential consequence and long-term effects. The performance of both methods in the current case depends upon selected search keys, search algorithms, risk levels, and many other factors. In making a decision to use a prioritized review method, the litigant should balance the need for early case assessment and its disruption to document context. Every effort should be made to preserve document context so that document reviewers will not inadvertently produce non-relevant sensitive documents.

This article discusses issues of general interest and does not give any specific legal or business advice pertaining to any specific circumstances.  Before acting upon any of its information, you should obtain appropriate advice from a lawyer or other qualified professional.

This article may not be duplicated, altered, distributed, saved, incorporated into another document or website, or otherwise modified without the permission of TASA. Contact marketing@tasanet.com for any questions.

Previous Article Causes of Inadvertent Disclosure of Privileged Documents and Strategies for Protecting Privileged Documents
Next Article How School Districts Should Respond: Measuring Meaningful Educational Benefit
Tasa ID1793

Theme picker


  • Let Us Find Your Expert

  • Note: This form is to be completed by legal and insurance professionals ONLY. If you are a party in a case that requires an expert witness, please have your attorney contact TASA at 800-523-2319.

Search Experts

TASA provides a variety of quality, independent experts who meet your case criteria. Search our extensive list of experts now.

Search Experts