Big Data’s Impact on Litigation

Big data is one of those vague phrases like “the cloud” or “artificial intelligence.” We understand conceptually what it is, but the specifics are beyond how we think about it in our daily lives.

We understand that data is everywhere, but most of us seldom give thought to its impact at an individual or societal level. We give up privacy in exchange for convenience – whether that means news feeds or search results tailored to our preferences, thermostats that know when we are at home, or smartphone apps that know which coffee shop we visit on our way to work. 

This information that is shared behind the scenes is coupled with the data we openly broadcast on social media channels, where we announce when we are out of town, what we had for lunch, and what our thoughts are on our co-workers, politics, religion and clothing brands. From status updates and internet searches to questions asked of our smart home assistants, we are providing a steady stream of data to businesses that are using this information to build elaborate customer profiles on each of us. 

The nebulous conglomeration of all of this information is the substance of big data. And its existence provides a new set of challenges and opportunities in legislation.

Big data has blurred the line between what types of evidence can be presented as courts continue to examine the nature of all of this personal data in regard to litigation. The availability of personal social media posts and even private messages are potentially accessible in company litigation. With the increasing popularity of BYOD (bring your own device) programs across businesses and the ubiquity of smartphones, the line between professional and personal communications is not yet clearly defined. In coming years, determinations on the admissibility of big data collected on organizations and individuals will have significant impacts on the practice of law.

It is essential, therefore, for businesses and attorneys to understand its implications.

How Big is Big?

Big data consists of large, complex datasets from multiple sources. These datasets are so big that traditional data processing software can’t manage them. Big data is defined by the volume, variety, and velocity of the data. 


  • Ninety percent of the world’s data has been created in the last two years alone. This number is expected to double every two years, totaling 40,000 exabytes by 2020.
  • The volume of data created by U.S. companies alone each year is enough to fill 10,000 Libraries of Congress.
  • By 2020, there will be more than 50 billion smart connected devices in the world, collecting, analyzing and sharing data.


Traditional data was structured and could be stored in a relational database. There were defined fields and categories that could be searched. Big data comes in an unstructured format from a variety of sources and types, including text, audio, video, web server logs, social media information, streaming data from sensors, and e-mail transaction.


Velocity refers to the speed at which big data is processed. The handling and manipulation of this data would not have been possible with the computing power that existed just a few years ago.

Managing and Mitigating Liabilities

With the volumes of data flowing into and out of companies to and from customers, clients, vendors and partners, the importance of information governance and information security policies is increasing exponentially. The loss or misuse of customer data has serious ramifications. 

Companies no longer have the luxury of treating information governance as a reactionary tool to be produced only in the event of litigation. Businesses must develop ongoing, reliable and repeatable processes to handle the data they collect and store – not waiting until after the threat of litigation.

Information Governance

Ongoing information governance and analytics is crucial to not only operational but also legal success for client organizations.

“Information is the oil of the 21st century,
and analytics is the combustion engine.”

~ Peter Sondergaard

The EDRM Model

Big data is meaningless without a way to collect and process it. With data volumes doubling in size about every two years, organizations are struggling to find ways to contain and utilize it. Companies around the globe are working on figuring out how to refine overwhelming amounts of data into usable, actionable information. This is as important for attorneys as it is for marketers, developers, and business leaders. 

The field of data analytics focuses on the examination of vast and varied datasets to uncover information such as hidden patterns, correlations, and trends in the data. Just as businesses use big data to improve customer service or increase profitability, attorneys must learn to use big data to better serve their clients. 

Through the use of legal analytics, firms can clarify existing documentation and extract factual information from exceedingly large datasets. With the flood of information now available, lawyers must shift to the new paradigm of automation and analytics to provide the most efficient delivery of legal services. It is important for all attorneys to understand the nature of big data and its implications, as its availability will permeate litigation in years to come.

Legal analytics are especially crucial in document review and discovery.

EDRM Stages

The Electronic Discovery Reference Model (EDRM) is a framework that outlines standards for the recovery and discovery and of digital data. It is meant to serve as a conceptual standard for the eDiscovery process.

  • Information Governance: This ongoing management process refers to daily operational handling of electronically stored information (ESI) from creation to final disposition.
  • Identification: Locating all potential sources of ESI and determining scope.
  • Preservation: Ensuring ESI is protected against inappropriate alteration or destruction.
  • Collection: Gathering ESI for use in the eDiscovery process.
  • Processing: Initial conversion to a usable format for review and analysis and reduction of the dataset.
  • Review: Evaluating ESI for relevance and privilege.
  • Analysis: Evaluating ESI for content and context, including key patterns, topics, people and discussion.
  • Production: Delivering ESI to others in appropriate forms.
  • Presentation: Presenting ESI at depositions, hearings or trials.

An effective and ongoing EDRM strategy is of crucial importance to attorneys and the clients they serve, especially considering the direction of court rulings regarding document retention and delivery. Improper data management could result in severe sanctions. Courts are increasingly unsympathetic to the argument that a defendant does not have adequate financial, human or technical resources to comply with eDiscovery requests.

Just as the growth of big data has presented challenges to the legal profession, it also offers solutions. Big data processing technology is frequently used in the eDiscovery process, significantly reducing the amount of time required specifically in document review.

The Legal Landscape

Courts are becoming increasingly sophisticated with regards to automation of the review process and the efficacy of machine learning and other technologies related to the production of ESI. As evidenced in the recent ruling in Rio Tinto v. Vale, US magistrate Judge Andrew Peck stated that if a technology-assisted review used continuous active learning (a machine learning model), the contents of the seed set is “much less significant.” This ruling was significant in that it was a tacit acknowledgment that technology is least as reliable as human review in document selection. Judge Peck previously permitted the use of predictive coding in a groundbreaking decision that permitted the use of predictive coding in review. These rulings among many in a string of cases and amendments to the Federal Rules of Civil Procedure in relation to the treatment of ESI.

Over the past two decades, laws have evolved to handle digital information. From Zubulake v. UBS Warburg, where Judge Shira Sheindlin ruled on a party’s duty to preserve digital evidence, a lawyer’s duty to monitor his client’s compliance, and the imposition of sanctions for spoliation of digital evidence to the changes to the FRCP to include provisions for the discovery of ESI, the laws have become clearer on the entire eDiscovery process.

Cicayda Can Help

As the courts continue to transform their approach to electronic data, attorneys must adapt to represent their clients properly. They must work to help their clients identify potential sources of relevant information and put them in the best position to defend themselves in case of litigation. 

Cicayda’s review platform, Reprise, enables clients to search, organize and analyze electronic documents faster than ever. The early case assessment feature is an automated assistant that maps out cases and gauges budgeting needs. Reprise’s review tool manages multiple attorney reviewers analyzing documents all within a single dashboard. 

Cicayda also offers a newly updated legal hold tool, Fermata, which allows protection and defensibility on the onset of litigation. These technologies are what is shaping law today and will be a prevalent force in the future. Join us on the track to further innovation and gain efficiency in every arena.