How data is changing the e-discovery landscape

It is no secret that the technology used to shape our global economy is now being used by corporations and law firms to shape their approach to the discovery process.  

Gartner is forecasting a 15 percent growth in the e-discovery software market over the next four years, bringing worldwide revenue from $1.8 billion in 2014 to $3.1 billion in 2018. 

With explosive growth in data volumes and technological trends driving business innovation, computer assisted review is no longer the future of discovery, it is already here and changing the way we look at information. 

Hardcopy was not always easy but it was tangible; we could see it consume offices and warehouses and feel it as it was making its way through the discovery process. Today, the relentless update cycles of hardware and software and the ubiquity of the Internet are driving the world we live in, personally and professionally.

Our data footprint is increasing; it is expected that the volume of data in the world will double every two years, totalling 40,000 exabytes by 2020. Furthermore, the amount of user-generated data is growing at an estimated rate of 62 percent per year. Given this trend of overwhelming data growth over the next several years, it is not surprising that the electronic discovery (or e-discovery) market is expected to see a double-digit growth rate over the next few years.

While many jurisdictions may still be in their e-discovery infancy when compared to the United States, the inevitable adoption of e-discovery best practices and evidentiary rules cannot be ignored. Discovery has changed over the last decade, and it is still evolving. As a result of continuing development of discovery specific technology and analytics, we are now better able to understand data and how different sources relate and interact with each other.

We are also better able to understand concepts and how to classify data across otherwise unmanageable volumes. Analytics is not just a trend in business, it is about how businesses operate at micro and macro scales.

The same concepts can be applied to discovery with highly accurate results that save time and can drastically reduce cost. Through careful planning and developing relationships with service providers, discovery practitioners are now able to reduce the volume of data processed, hosted, and reviewed with greater statistical certainty.

Changing landscapes

In the United States, the Federal Rules of Civil Procedure were amended in 2006 to include provisions for the discovery of electronically stored information (ESI). Similarly in England and Wales provisions have been added to the Civil Procedure Rules aimed at streamlining the litigation process and managing costs related to disclosure.

Such developments in procedure rules, together with increasing active judicial interest regarding disclosure issues and mushrooming data volumes, are driving a shift in the discovery landscape, generating new roles and trends in the discovery profession and powering further technological innovation. These trends have created specialist requirements that focus on the management of the process and the deployment of technology-centered review strategies to manage cost.

While court systems are evolving in the way they approach electronic data, the e-discovery market is evolving to handle the proliferation of new sources and types of data such as social media. Profile pages, private messages, posts, tags, and other types of publically available messages must also be evaluated to determine what information may be available, relevant, and where it resides. Further to the identification of potential sources of relevant information, a plan needs to be put in place for the defensible collection and authentication of this evidence.

Another evolution in ESI is the growing use of smartphones and a surge in ‘bring your own device’ (BYOD) policies by corporations. As the number of smartphone and tablet users increase, so does the likelihood of corporate data cohabitating with personal data on employee owned devices.

Organizations need to focus on comprehensive mobile device management solutions that provide detailed guidelines for appropriate use of corporate resources and information. Additionally, these solutions should be complementary to a comprehensive mobile device investigation, incident response, and e-discovery plan.

The discovery process

An item of paramount importance in discovery is implementing a process for the organization to follow to identify key risks associated with a discovery engagement as well as managing those risks throughout the discovery process. This process should be able to balance the requirements of discovery while not being so specific as to lock the organization into a specific but inflexible methodology or solution, and to identify areas that can reduce the overall costs of discovery. Determining what can be done in-house, what should be done by a vendor, and what should be given to outside counsel and weighing the costs and risks of each of those will assist with decision-making and lead to a defensible and cost-sensitive discovery strategy.

Generally, contracts can be negotiated with third party hosting vendors that deal with data volume across multiple matters, which will drive down per-gigabyte the cost of smaller, one-off matters. To go a step further, agreements can be made with vendors that offer full-service discovery capabilities (e-discovery readiness plans, collections, filtering, processing, hosting, and review) that leverage volume discounts and partnerships to reduce the overall cost associated with discovery. These relationships can be developed to build a standard protocol for discovery and provide consistent levels of service.

Analytics in discovery

Analytics can play a role through all stages of the discovery process, from helping an organization create a discovery plan to helping counsel understand and categorize documents in a production set. Developing discovery strategies based on the Electronic Discovery Reference Model (EDRM) and Computer Assisted Review Reference Model (CARRM) will help organizations respond intelligently, quickly and consistently to legal demands. With a better understanding of its data and how it is stored, an organization will be better suited to respond and negotiate discovery terms with opposing counsel or regulatory investigators early in the process, and responsibly manage data volumes, review times, and costs effectively.

Another early stage of discovery where analytics can play a key role is during the collection phase of the EDRM. Analytic tools can be deployed across networks to identify and visualize data types for specific custodians or network shares. Additionally, tools can be deployed at the point of collection to analyze physical media to provide the legal team with not only an understanding of what data exists in a collection before it is processed and hosted, but also to provide insights into additional custodian data that may exist on a single piece of hardware.

By having the ability to access and view key operating system artifacts and file metadata at the time of collection, counsel can make real-time decisions on what data sources should move to the next phase of discovery.

Collection analytics can also help in situations where it is necessary to preserve large amounts of data early on to mitigate the risk of data loss. Without a system to fully track this electronic evidence and conduct analytical queries on the collected data, the cost of discovery can increase rapidly when unwanted and non-responsive data is processed and hosted.

In addition to helping organizations reducing data volumes leading up to review, analytics is beginning to play an important role in the review process. Through technology such as machine learning and text-based analytics, the first pass review of one million documents can now be conducted by reviewing and coding less than 10,000 documents.

Machine learning is modeled after the learning capabilities of the human brain; systems utilizing machine learning are capable of understanding and parsing complex data across multiple sources and extrapolating concepts and connections. Using computer assisted review, each round of review allows the analytics engine to better understand subject relevance and confidence interval based on unbiased human feedback.

While legal jurisdictions are still in the early stages of vetting and allowing computer assisted review in discovery, there are other practical applications for this technology. One such application is using computer assisted review to identify relevant information when you have been provided with a “document dump.”

A document dump occurs when opposing counsel provides you with a dataset containing responsive documents intermingled with an overwhelming amount of non-responsive documents. By using computer assisted review, the human reviewers are able to focus on the most relevant and statistically significant documents early in the review process, without the distraction of large volumes of non-relevant items.

This same approach allows counsel to conduct second and third pass reviews of the most relevant and potentially the most important documents first. While there will still be instances where all of the documents will need to be reviewed in a linear first-pass review process to determine relevancy, the most statistically relevant documents can be batch-assigned to the review teams first, greatly improving the speed of the review process.

Linear document review accounts for a considerable portion of document production. Incorporating analytics into your discovery approach can not only reduce the volume of data collected and reviewed, but may facilitate significant cost savings.


The best way to reduce the overall cost of discovery is to develop a litigation readiness plan before it’s too late. While no discovery is the same, the major components will remain consistent over multiple matters. Understanding your organization’s digital environment and retention policies as well as identifying and educating key information technology resources can reduce the amount of time, and thus expense of identifying and collecting relevant sources of ESI.

Another key component to litigation readiness is the development of roadmaps that will help stakeholders understand critical milestones and processes; this can include planning for collection strategy, filtering by file type, keyword lists, implementing computer assisted review technology, target production dates, and production formats. Lastly, effective and targeted deployment of analytics resources on both left and right hand sides of the EDRM can significantly reduce cost and decrease collection, processing and review times. 


Previous articleFew signs of fiscal reform
Next articleMovements in global tax climate a plus for offshore
Nick Kedney

Nick works primarily in conducting investigations into international commercial fraud and money laundering, litigation support and asset recovery. He also specializes in the deployment of data analytic and online research resources in financial investigations.

Nick Kedney
Deloitte Forensic
Citrus Grove.
Cayman Islands

c: +1 (345) 925 5404
t: +1 (345) 814 2281
e: [email protected]

John White

John has over 500 documented hours of investigative and forensic training. Specialties include Operations Management; Evidence Processing/Chain of Custody; Security Management; Communications Process Improvement; Data Analysis; Data Recovery; Forensic Data Acquisition; Data Carving; Technology Implementation; Troubleshooting; Customer Service; Training and Development. 

John White
Deloitte Forensic, Citrus Grove

t: +1 (345) 949 7500
e: [email protected]