Aaron Cronan: Welcome to the continuing legal education presentation of eDiscovery Primer: What You Need To Know And Why. I am the presenter, Aaron Cronan. And I will be your guide through what can be a daunting topic with a lot of moving parts that many attorneys are either consciously or unconsciously trying to avoid. So we're going to take a look at this step by step in a way that I hope is digestible and that will be able to stick and give you a little bit more comfort in moving to the next step in adapting to a world that is almost completely digital at this point, when it comes to almost every issue. So I'm going to have some illustrative examples, and hopefully some concrete uses that you can implement as you come out of this. I graduated from law school in 2000 from UC Hastings, and I believe we were the last class to actually touch books for book research. At that point, they made us do both. We had to learn how to do book research and then we got to use the electronic tools.
After us, I think they just went straight to teaching you how to use the electronic tools. It was a good example of how superior digital research was or the digital tools were over the analog paper. There's still something to be said for holding on to paper and reading it. I still feel that way. I absorb things differently with having some sort of spatial relation to where things are on a page. But that said, the process of actually searching for terms, finding those terms, digging into references, linking to other references, shepardizing, all of that becomes so much more effective and faster when you're dealing with electronic records. Not to mention the fact that you don't have to maintain an entire library, a fortune's worth of a library in every single law office to pull off this research. So it's a great leveling tool. When I graduated, the dot-com's were just starting to fail in 2000. So that was changing the landscape quite a bit. But what ended up resulting was we ended up with a lot of residual impact.
The internet really got its feet around that time and we started seeing that email became ubiquitous for most communications because it is a better tool. It's faster. You have a record that is searchable. So businesses and most other use cases started implementing these digital records for communication. And that started definitely changing the landscape of how litigation was handled. We were not having to just sift through paper anymore. But at that time we were still printing to paper. So the firm that I first worked at didn't even have computers. So just to point to the fact that attorneys are some of the last ones to adapt to new technology, is that we didn't get computers on our desk until 2002. And up until that point, I was just doing dictation into a machine and then my assistant would type it up and then I'd have to review paper. And they were the only ones with computers. So there's something to be said about maybe attorney's billable time being transitioned, where now I've spent a lot more time writing things, probably actually crafting words and editing myself.
But there's a lot of things about this transition that have been beneficial to businesses. Especially now with COVID and having to go remote, I now transitioned; my business is almost completely digital and my office is 90% virtual. I did deal with some technological issues coming out straight into litigation. I dealt with peer-to-peer file sharing. But I'm by no means some technical guru. I don't have a technical background, but I've always enjoyed the objectivity of it. So there's something about the schematics and these Venn diagrams of where data might be that I found very interesting. And that drew me into working in electronic discovery. Around 2006, electronic discovery became... There was a seat change. We had had some major cases involving electronic evidence that changed the face of how litigation works, especially at the federal level. And it's been trickling down to the state level ever since. So we are in a position now where it is virtually everywhere. And now I'm going to make my case to you for why you need to know about eDiscovery.
So first we're going to talk about why eDiscovery then we're going to go through the workflow of eDiscovery so you can visualize the overview of how it works. I imagine much of it you understand, but there's some niche elements of it that I think if you're more comfortable with will give you a lot more leverage when you're dealing with it. Then we're going to go pretty hard into RFPs, because that is where I think the real gold is. The RFPs are really a strong foundation and jumping off point for your litigation and give you incredible leverage in discovery if you do it right. Okay. So let's talk about why eDiscovery and why it's really important that you stop hiding from it and then we start looking into it. One of the main things is that electronically stored information is everywhere. It is everywhere.
So the default definition comes out of the federal rules of civil procedure, section 34(a)(1)(A), which defines "documents or electronically stored information- including writings, drawings, graphs, charts, photographs, sound recordings, images, and other data or data compilations stored in any medium from which information can be obtained either directly or, if necessary, after translation by the responding party into a reasonably usable form." This is so broad that there is no conceivable storage of anything that contains any information that does not fall under this. This basically surrounds every single thing. So every piece of digital, anything is discoverable if it falls within the other realms of relevance or responsiveness under the federal rules. So we can't pretend stuff that's digital shouldn't be involved in cases. I will go on to say that ESI, we're now going to call Electronically Stored Information ESI, that's the shorthand, that ESI, sometimes I'll call it data, it touches every single case. I would challenge anyone to think of a case that actually doesn't contain some touch point of electronical information.
So if you can imagine even just the most simple rear end case; two cars, it's just a rear end case. What could be more physical in the real world? Well, if you think about the fact that one of the drivers or the one of the passengers or maybe a witness might have texted someone before, during, after, maybe they were listening to something, maybe they were touching a screen. Something about that interaction probably touches on an electronic feature. And whether or not you're in eDiscovery, what we would consider eDiscovery or more of a forensics thing, I'll get into that later. But there is a touch point of something electronic. You might even be able to get data cam footage or know that the light was a certain color. Other things about eDiscovery that are really important is that if done properly, everything in the data set is instantly searchable. You shouldn't have to sift through pages and pages of pages, thousands of pages of documents to get to something. You should be able to find it very easily with the right search terms. So that is really important.
The other thing is that it's easy to authenticate. If done properly, you can authenticate documents much better than you could in the analog world. You can fake stuff in analog and forge things that is very hard to do in the digital realm. And there's reasons for that, having to do with the metadata and the registries on the computers and just how data is stored. It leaves footprints pretty easily. All this electronic evidence is expressly discoverable in almost every jurisdiction at this point. So I have to deal with this across the country. Most states have something in their civil procedural rules that would allow for the discovery of electronic evidence. Judges are still catching up, but once I get involved in the case, usually things start smoothing out and we are able to get to that data. eDiscovery gives you leverage, especially in plaintiff's cases. And I would say that this case is mostly aimed towards I'm coming from largely a plaintiff seeking discovery from a larger entity standpoint. But I also have experience defending against these requests.
So I got it pretty balanced on how to do it the right way so that you're not being ridiculous. But it does give you leverage, especially if there's something that they're hoping you're not going to get to. It's probably in an email or in an old document somewhere. And if they want to avoid it, they're going to give you a lot of trouble in the discovery process. But that's where the smoking guns are. The metadata is hugely helpful. Again, if you do it right, you're going to have information about these document, when they were created, who edited them, that is not present in any kind of analog, even a PDF. If they printed a PDF or something, you're not getting that information. It's hard to edit and change records without there being some information lost or gained or some sort of evidence that the record was altered. This is all part of the metadata. There's a lot of timestamps on it; help you build up a history. It helps you find things. It also helps you determine when or where something was changed or created.
Their audit trails, usually that you can track down to help you determine whether or not something was altered or when it was created and you can filter out duplicates. Even if somebody dumped you with multiple duplicates of things, it's a lot easier to filter through that than it would be in a data dump or a discovery dump that you would normally get in paper. The troubles with eDiscovery is that it is a pretty steep learning curve. There's no question about it. And every time you're going to get into to a tool, you're going to have that curve and I deal with this a lot where people kind of understand what they're talking about and they get themselves into trouble because there are two attorneys that only vaguely understand what they're dealing with and they come up with solutions that don't make any sense. And then we have to sort of backfill the problem. There are huge data volumes and you can get yourself or the requesting party into a quagmire of how to just deal with a massive amount of data to sift through.
And how do you find stuff in a huge haystack? It requires special tools, and those tools usually involve a cost. They're not, they're not usually free. There are some tools that are pretty strong that are very affordable, but some of the really, really dialed in ones with a lot of special features do cost some money. The prices have come down significantly from when I first started working in it, it cost $2,000 a gigabyte to process data. And that price has come down into, you can process data for like 15 bucks. Or actually they'll just process it for free and store it for 15 bucks. But that's on the lower end. Sometimes, it depends on what the package is for the review tool. It costs a lot of money to have, especially the responding party, to go through the data if the data set is very large and code things and review and review for privilege. It can be very time consuming. It's great billing for the firm, but there are a lot of tools now that allow us to filter through and skim through that stuff faster and really large sets.
You can start using and machine learning and filter things out that way. There is a failure to preserve and collect, is a pretty common problem. There are problems with processing and exporting the data and importing the data. So you can get just a lot of headaches around how to transfer the data; their production set from them to you. But we're going to get into some of the tools and ways of really dialing that in so that is not so much a problem anymore because you don't give them any room to mess up. You're very specific about what you're going to ask for. Now, the other reason why I really think it's important that people, that attorneys start focusing on eDiscovery is because at this point, it really starts to look to impact your duty of competence. Electronic evidence is a huge part of everything we touch upon. The ABA Model Rules specifically says that we have an obligation "to maintain the requisite knowledge and skill, a lawyer should keep abreast of changes in the law and its practice, including the benefits and risks associated with relevant technology."
There's a number of circles around. I know my state is definitely pushing that attorneys have to be technologically savvy. That means you got to know how to do research online. You need to know how basically electronic evidence is going to work. And at this point, if you don't know how to work with electronic evidence in litigation, you are likely falling short of your duty. Imagine, if you could have found easily the smoking gun, the great evidence in your case, and you missed it because you went to paper and were searching with your eyes. One of the things we need to do to get our heads around eDiscovery is to think about using the right tools for the job. And there's a sliding scale from one end, which is forensics, to the other end, which is just massive data source, and some spaces in between. And so I like to think of this as you're digging in your backyard. And on the forensic level, you need forensics tools. You are not going to be using a shovel.
You're probably going to be using hand tools and brushes and gentle things to tease out what you need. And that is the forensics level on the ESI side. So if you need to know whether or not a document was forged or whether or not something was deleted on purpose. So for example, I have two cases that ran right up against this that were pretty good examples. One involved changing a contract for the sale of a business after the seller had passed on the buyer claim that on the day that they signed, they changed out one of the pages in the document. She stopped making payments because the terms in that one page that they had changed indicated that she wouldn't have to make payments anymore if the seller passed. That is not the copy that everybody else had.
So in order to prove that this had been drafted by her after the date of the signing, we were going to get our hands on her computer and have a forensic examiner take a look at the hard drive and determine whether or not that file existed on the date that they signed her before or after. My bet was going to be that it was going to show up just a few months before the litigation started. We never got to that because the laptop magically disappeared and the defendant decided to settle. She said that the laptop had gotten stolen. So that was an example of where we were going to use a forensic tool to authenticate or determine when a document actually had been generated and where. And so we started collecting off with their hard drives and stuff, but what we really needed to do was get at that laptop. Another case that forensics examination was important was in a case involving some IP theft or trade secret theft.
The claim was some employees had conspired to take some IP, some trade secrets and go and start a competing business, and then deleted a bunch of stuff off of the computers. So what we needed to do was bring in a forensics person to examine each hard drive and determine what was accessed on what dates. We were able to get down to the point where it could tell us when somebody stuck a thumb drive into the computer and could actually hear and see when the ding sound for the thumb drive being pushed in and pulled out went off and tell us what files were accessed. So that is the forensics level. And that is very, very specific for very small and targeted acquisition of data and information. Usually you are dealing with accessing the other party's hard drives or information. Moving up the scale, we have the small data set and small document set, which in that case, you could conceivably just hand tool everything, like gardening your backyard yourself using a shovel or a hoe, you could conceivably avoid using a review platform tool.
But at anything above 100 pages, we're really starting to look at where you're going to want to use an eDiscovery tool. And even at that level, I would recommend that you use a tool and require that they export through a tool because you lose the metadata. So they either need to give you the data files that are unaltered or, as we'll get through how that's structured, or they need to give you a load file set with metadata. And we will discuss why that is in a little bit. And then you have the massive, or large to massive data sets where you're dealing with thounsands of documents or hundreds of thousands of documents. And there's just no way you can manage that without large tools. And even in Med Mal cases, getting access to some of the records, my deal with last couple of nursing cases, the emails and the communications around the matter are highly relevant and important to a systemic problem sometimes, and you just need to have eDiscovery tools to deal with that and you need to know what you're going to ask for and how you're going to deal with it.
So moving on to my eDiscovery philosophy. You do need a review platform or tool to go to. It's like a dog chasing a car if you're, what is it going to when it catches it? So you can ask for what you're going to ask for, but if you don't have any way to ingest it and look at it in a structured and logical process, you are going to have problems. So you need to figure out what tool you're going to use ahead of time and then start building your plan around that. You need to be careful what you ask for in your RFPs. I get called in frequently as a consultant or expert to help unwind some RFP fight where the plaintiff has made requests that are just way too broad or sloppy and not specific enough and it gives the defense a lot of wiggle room to move around that. So I've wanted zero in on things so they can't do that. Also, be careful what you agree to. Very often, again, attorneys that kind of know what they're doing will agree to things that don't necessarily make sense.
And then they come up against problems and by that time, the defense has already spent a bunch of money on a process that doesn't really work. And then when you get the files, you got to verify that that data set is complete. There are a lot of weak spots you can probably find in discovery. And I see this on my end. I don't know how if this happens outside of when I'm involved, but usually by the time I'm involved in a case, I'm not involved very long because the defense rules over. They've been put up a fight, they've been blowing smoke for a while. And then we push on a couple spots and suddenly the case dynamic changes. You got to watch out for game playing. I see all the time where this metadata that should be in a file is not provided and we're going to go through how to stop that from happening. You want to watch for gaps and holes.
I just got a case now where somebody, they produced what looked like a really complete data set but there's a bunch of emails that are missing attachments. Attachments are inconsistently attached. It might be an honest mistake, but it also could be shenanigans. Either way, my client is entitled to getting a full set of documents that are responsive. And I will say this here and I think I'll repeat it a couple times. Searchable text must stay searchable. If somebody is giving you a document that has been pushed out to print or paper or made a PDF that's flat and then they have OCR'ed it or scanned it and now you're supposed to search that at his garbage. If it was a natively searchable document, it has to stay natively searchable. And if it's not, you've got to a reason to scream and maybe move to for a reproduction of documents. So we're going to go through some key concepts. We discussed ESI, the Electronically Stored Information, above.
And that is if you can conceive of something that is digital in any way, shape or form and is actually information, then it could be discoverable in a case. Metadata. That's a term that people throw around all the time. They'll say it's the data about your data. Well, it is. So if you have a file, a word document, the metadata is all the information that your computer knows about that particular file. It knows the date that it was created. It knows the location that it was stored. It knows the changes that were made that and knows who the author was. All that information is kept in the file. And when it's collected, it can be pulled out and used and provided to you when you're making the request. The next thing is native files. And so these are the things that people like to get all into fight over. Native files can be really useful in a discovery request. But what a native file is, is just the original file. So if you think again about that Microsoft Word document, the native file is that document. That's it.
You could open it up on your computer and it would open up that file and you would see that document. Native file review is not ideal. No eDiscovery expert or consultant is going to recommend you start opening up files and looking at them natively. It's just not a good idea. You change the metadata, you change all kinds of things. So you do need a level of forensics and how that data is collected. So the native files are like the MSG files in your email. JPEG is a native file for images. The .doc are all native files. A TIF image is what you'll commonly see in a load file where you've taken the NA file and they'll basically create a flat image of it. The benefit is that it's immutable, it's just an image. And it's like you printed it, but you printed it electronically. A PDF is something we're commonly using now through Adobe Acrobat. And that is really just a TIF with a wrapper around it and a lot of editable information.
So every PDF contains that TIF. Especially in eDiscovery, if you're going to get a TIF load, a PDF file, it's going to contain the TIF, more than likely. Some things to think about also isa PST, is the default email mailbox out of Outlook. So the Microsoft uses a Personal Storage Mailbox. So that PST, if I export it out my Outlook right now, I've got all of the emails, all my contacts, everything. You can control the size of the PST export, but that's usually what somebody's going to collect and put into the eDiscovery processing system, is this the PST mailbox. Out of Google, they call it an inbox. So it's a different format but largely the same concept. And then OCR is a term I'll use a couple times here and that's Optical Character Recognition. So that's when you have a document that doesn't have searchable text on it and then the computer will look at it and try to determine what the letters are so that now you have searchable text. This is not a foolproof option. And I've seen a lot of weird results come out of it.
For example, I recently just had a case where all the .com's in all of the emails basically got converted to .corn, which looks like .com, but is not. And so when we were having some weird results come up when we were running searches on the production we were given, I realized that they had OCR'ed every thing and not given us native text extracted, which made me upset. That was grounds for again, questioning, was this a mistake? Was there a reason behind this or was this shenanigans? So here's another concept that you're going to hear a lot about, which is the loadfile. And I want to dive into this for a minute so that you have a little more comfort with what it is. The loadfile is effectively the way of transmitting the production data responsive information from the RFP, from the producing party to the responding party. So we're going to go through the steps of how the workflow works. But effectively the loadfile will contain the set of records that they're producing to you.
If you can picture, it's the banker boxes of the production set. But instead of banker boxes, it's just digital files. It'll usually be transmitted these days over a secured file sharing program, which allows you to download directly off of a server someplace. And what it contains is there will be a DAT file, which is really a common to limited or some sort of limited, they're probably not common to limited, database file. You can open it up in Excel if you know what you're doing. I do that a lot. But if you look at it, it's an index of all of the records. So if you think about a row across of one record, there will be a Bates number, there will be a document ID number, there will be a bunch of metadata about that particular record. And there will be a couple of pointers that point to folders. And one will be a text folder and one will be an image folder. And if there's natives, there'll be a native folder.
So those are the other folders that will come along with this DAT file, in the loadfile. So the load file is everything in that is a DAT file, and then you have a folder that contains text. And that text is searchable. Usually it's often just a notebook page that has all of the searchable text from the original file that it was pulled from. So if I wrote an email and that email is in the production set, all of the texts from that email would be in one file inside this text folder. And there's also an image folder. And that image folder contains usually a TIF, which is the image of that email printed flat. And the reason we do this is because the text and the image are pretty small files and they allow for searching and review without editing or messing with the actual file itself. And you don't have to open up a browser or the native files operating. If you want to look at the email, you don't have to open up Microsoft Outlook to look at it, which is the way you would look at a native file.
You actually review it in the text folder or the inside your review platform, which points to the text and the image. So if you were to look at it in your eDiscovery tool, you're going to see what looks like an email and you're going to have searchable text and it'll pull it up. And that is how you will see it. And so to you, once it's ingested, it looks like it's an email without having to open up the individual Outlook account. Let's go up to the 30,000 foot view and talk about the process of the workflow for eDiscovery. So let's think from the very beginning of the life of a little piece of evidence. And that is when it is created. So the fact that that document popped into existence, someone wrote an email, they wrote a Word document, they sent a text, they took a picture, that is the moment the record is created. And when moment happens, all that information is captured on whatever device that created it and that file is now imprinted with a bunch of information in its native state.
Then next in line, that record does what it does. It lives where it lives. It has its own life. And then all of a sudden there's a notice of their dispute. And that is when the preservation obligation kicks in. So maybe that file would've sat happily on its hard drive for the rest of its life like most of my stuff does, or maybe it was in a company that has a deletion cycle where if a record isn't touched or done anything special to, it will get deleted after so many days or after a year, at some point. So the preservation obligation pops in, hopefully the responding party will take steps to preserve that document, so that document lives happily on its hard drive and somebody gets a notice not to delete it. Then there's the request for production that's sent from the requesting party to the responding party. And then the responding party has to go and find that little piece of responsive evidence and collect it. And how it's collected is important.
We're not going to go in a great deal of detail there about collections, but ideally when it's collected, it will be collected in a way that does not alter the metadata. There's a couple pieces of metadata that can get altered without much of a big deal and really can't avoid. But most of it should all be intact. They take all the collected data including our little friend, the native file, let's say Mr. Email, little email, and he gets processed. And so it's taken into the eDiscovery factory, basically. They have tools that will look at that document. They will basically open up the file, pull out the metadata and put it into a database. It will image that file and create the TIF. And it will extract the text without any errors, it should extract the text exactly the way it is in the document. And that's why we want it native extraction and not OCR because OCR does have errors. They will then run deduplication against it and filter things because there's a lot of noise in these collections.
A lot of documents that nobody should have to read, because that's how you waste time and money. Then there will be a review phase where the documents have to be reviewed. This is a lot like what you would see in a paper analog. But in this case, there's a way of filtering things and tagging things much faster, even though the volumes are greater. And so this is the review phase when it's tagged and coded and then it is ultimately put into a loadfile for production. And then that loadfile will be transferred over to the respondent or the requesting party. That loadfile is then taken into the requesting party's eDiscovery tool, which should be a fairly simple job if it is done right; if the loadfile is right. And then the requesting party has the opportunity to look at the documents and do their own investigation. And the thing about that is, it's great because you can filter out the stuff you don't want. You can zero in exactly what you do want and if the loadfile is correct, you'll have a lot of information, a lot of great useful information to sift through.
Then you can export batches of the data for use in depositions. I know a lot of attorneys still prefer to print out the actual pages for deposition. I myself have a preference for having physical papers to sift through and hand to people. Although now we're definitely transitioning into ways of showing it electronically and without having to shuffle, you can jump to pages pretty quickly. There are great tools for that now. And then all the way up to trial evidence, is how's it going to be presented? At that point, it's probably going to be electronic in their entire industries around how electronic evidence are presented in the trial setting. So I want to talk about the really important place to begin this process after that record is created, is when the preservation obligation kicks in. So otherwise that document just lives happily throughout its life and does its own thing and nobody really gives it any thought until there's a problem where we have a reasonably anticipated likelihood of litigation. So the clock starts when they get a demand preservation or they should know when litigation is anticipated.
Sending out the demand is important because it establishes when the clock starts, it establishes the claims and it sets expectations, especially if you list targeted sources and custodians that you know might be relevant. It really puts the onus on the responding party to start kicking in and taking this matter seriously. I found that this is a really powerful time to show the other side that you mean business and that this is going to be a serious matter. It kicks in when there's reasonably anticipated litigation. And the respondent in most cases has an obligation to issue legal hold to custodians that might be holding that information at a minimum and take steps to preserve the ESI. So if they just sit on their hands and don't do anything and stuff gets deleted, that's not an excuse. So it's unreasonable if they fail to issue a legal hold notice or take any steps.
Protect ESI sources. If they wipe hard drives of key custodians or they change or lose equipment, or they move to the cloud during the legal hold process, I've actually had a couple cases where that was going on and so that creates an interesting situation where a bunch of data is left behind and they're trying to claim that they don't have to produce it because they moved to the cloud. That's an interesting smoke screen sometimes for hiding information somebody doesn't want produced. The federal rules address what happens if there is a failure to preserve and whether or not sanctions are going to be required. So we're just going to read through it quickly.
FRCP 37(E). "If electronically stored information that should have been preserved in the anticipation or conduct of litigation is lost because a party failed to take reasonable steps to preserve it and cannot be restored or replaced through additional discovery, the court: (1) upon finding prejudice to another party from loss of the information may order measures no greater than necessary to cure the prejudice; or (2) only upon finding that the party acted with the intent to deprive another party of the information's use in the litigation may; and then this is where you get your penalties. (A) presume that the lost information was unfavorable to the party; (B) instruct the jury that it may or must presume the information was unfavorable to the party or (C) dismiss the action or enter a default judgment. So basically if it was accidental, then the party that failed to preserve has to take steps to reconstruct that data if necessary. And this is where the cost shifting really starts to step in. The cost of resolving that problem lands squarely on the party that failed to preserve.
If there is intention to deprive, which I have found sometimes in the past, then there is the opportunity for the judge to use a negative inference and instruct the jury that way, or if it's really egregious, they can dismiss the case. We don't see a whole lot of those, but it can happen. Usually the most likely thing is that the responding party is forced to reproduce stuff in a way that might be pretty costly to them. So it pays to get it right the first time. What that tells us is that they're probably covering up a part of the production; some evidence that might be really, really harmful to their case. So it's a signal that you should push harder.
Let's dive into the RFPs, because this is where I think most of the battle is won or lost. The RFP philosophy is my philosophy, is you need to be careful what you ask for. You need to be very precise in your language. It is an unfortunate habit and I do it myself, to just try to throw out the broadest language possible so that there's no chance that anything could fall outside of it. What you've done in doing that at is you've caught too much and it gives them the opportunity to scream about being overly broad and overly burdensome. So we want to avoid that. So you want to tailor your requests like spears and not nets. You need to understand what you need and how you're going to use it. So again, that doesn't help for you to start streaking about native files if you don't really need the native files. Sometimes it's helpful to have the native files in the production. I usually ask for them. But you need to know what you want. Are you going to want emails? Do you need electronic medical records?
Which is a whole other animal which is outside the scope of this because those are another trick. You need to be measured and reasonable in what you're asking for. And give them enough rope. I usually like to sit back and see what they're going to do because nine times out of 10, they do something weird and I've got now some leverage to push back on them. That's when I start taking depositions. So in the scope of your request, I'm probably beating a dead horse here, but this is important. You want to be very specific. I have a breakdown. When I write RFPs, we give them absolutely no wiggle room on how we're going to ask for that. I'm going to get that in a second with a form of production. You want to limit your time period, be reasonable. What is the reasonable time period you need? And limit that as much as possible to the area that you can justify asking for. You want to be clear on the sources. Are you going to look for databases and structured data?
That can be a nightmare, pulling information out of databases, and how to even handle that. That's one of the things like the electronic medical records because what you see as a user does not line up with how the data's stored. So that gets really, really crazy. Are you looking for shared drives or cloud storage? Do you know the custodians you're looking for specifically? If you know one of the parties that was emailing, then be very clear that you want that person's emails and documents. So what type of file types are you targeting? You can use the, includes but not limited to language, but be pretty specific about what you want to include. That way, at least you ask for that specific thing. Target file types specifically. You may not know that information yet, but if you can, then target it. So with regard to the form of production, you do have leverage being able to require how that data is produced.
So if you're quiet on that, which unfortunately a lot of RFPs I find are, you're going to get it in any way they can justify giving it to you and it's usually going to be not great. So under FRCP 34(b)(e), you have a set of rules that give you some leverage. "Producing the Documents or Electronically Stored Information. Unless otherwise stipulated or ordered by the court, these procedures apply to producing Documents or Electronically Stored Information: (i) A party must produce documents as they are kept in the usual course of business or must organize and label them to correspond to the categories in the request." This one seems obvious, but what it means is they should not be printing emails that were digital and handing you printed emails. They should not be printing emails and scanning them and giving them to you as printed emails, as PDFs. That is unacceptable. "(ii) If a request does not specify a form fo Electronically Stored Information, a party must produce it in a form or forms in which it is originally maintained or in a reasonably usable form or forms." This can go two ways.
Sometimes it will give you just a pile of native documents. Here's your hard drive. It's full of native documents. Which in and of itself isn't terrible. And depending on how they process them, if those are truly the original native documents, then you can just take them to your review platform and ingest them and make them searchable. That could work. Oftentimes you'll find that what they're giving you is not the original natives. They're the natives that they've attached to their loadfile, which are the clones of the natives and the metadata is not there anymore. They strip it out it's really a new file with some of the same guts. Looks like the same file, but it's not. So that's not ideal either. The other way that they might produce if you don't specify, is again, give you just a bunch of printed pages, whether they're scanned, printed or printed to PDFs, but they're not the native files. They don't have the metadata and they might be loosely searchable but it's not ideal. It's not what you should be getting.
Or the reasonably usable form is they give you a loadfile with a bunch of fields missing. And then "(iii) A party need not produce the same Electronically Stored Information in more than one form." And this is where giving them the instructions out front is important because if they produce garbage and you can't make a good case for why that's not the format it should be in, they're going to give you a lot of flack when you try to get the court to force them to produce it. So here's what we do. You need to specify your format. You can be very precise about all the fields, exactly how you want it to be. And this is standard eDiscovery stuff that any eDiscovery practitioner will be able to stand and put their hand on a Bible and say, "This is the way you should have produced it. That's what you want to ask for. Don't give them any wiggle room."
Searchable text must always stay searchable. You should not allow any OCR of native digital files. None at all. There's only a few exceptions where the native file is unprintable or can't be... There are some exceptions when you're dealing with redactions, although native redactions are happening more and more, where you have to redact the extracted text and you can't really do it so they have to redact the visible image and then OCR that image. But those are only on the documents where the redactions happen and they need to flag those and clarify that those have been OCR'ed rather than natively produced. You want to be very specific about all the standard metadata fields that must be included and have concluded a handout that shows this is the DOJ standard list. There's 30 some odd metadata fields. They're not all going to matter for every file, but you have that field, that record there available. If there is information, it gets populated. An email has the send to and from a field whereas a Word document does not.
So there are things that don't always make sense depending on the file, but you need to specify that those need to be there. Because if you don't, they're not going to give it to you. Here's the case that I got. I had a case where I was consulting with a district attorney's office and they were doing a civil enforcement on some consumer matters and they made a request for production and they got a loadfile. The load file had five fields, basically the Bates number and a couple other fields that were almost meaningless. And it didn't have any other stuff with the fields. So the respondent was arguing "Well, they didn't specify what fields they needed so we just gave them this amount." And in my declaration, I had to point out that those fields are generally automatically included that they should have produced and that they had to manually turn off those fields in the production.
The analogy was, we had ordered a pizza and because we didn't specify that it was baked or have cheese, they gave us dough and sauce. And apparently the judge really liked that and forced them to produce the whole thing. But that's the kind of stuff we don't want to give them any room to move on. You want those instructions in your RFPs. I do that by putting an addendum of instructions on every RFP that is issued and reference it in establishing the RFP rules. So in your RFP, you want to specify the loadfile structure. And again, the loadfile is the structured data that you're going to be importing into your platform. And that's where those fields are and that's where you specify how the TIFs are handled and how the text is going to be handled. And specifically that you're not going to allow them to OCR stuff unless they're redacted, that these instructions are unambiguous, that you're going to shape those instructions for your review platform and that you're going to expressly require those metadata fields that you want.
And we can give them that list that's part of the addendum that you're going to provide. Now you could do everything right and you're still going to get the objections. This is just the way this goes. So they're going to do their common objections, that over broad or the scope is too large, that it's unduly burdensome or inaccessible source or volume. The volume of data is too large, or that you're asking them to pull from a legacy source. Usually it's going to be combined with some vague claim about costs, the collection costs, the review costs. And the other thing about this is that these arguments tend to signal a potential pressure point where it is probably worth digging deeper to try to get to that data.
As an example, I just decided on a deposition of the director of IT for a company where the responding party, who was skilled nursing facility, had made claims that the email that was being requested was simply going to be too expensive to collect, that it was on a legacy system and this was on Lotus Notes of all things, which was fun because I actually hadn't had a case involving Lotus Notes. They had maintained a server of Lotus Notes, preserved everything for these old emails. And they gave the numbers of how many hours it was going to take to go and pull the data and this and that. And so we took his deposition and got very specific about exactly what the process was going to be and how much it was going to cost.
When it came down to it and when I wrote my affidavit in this case, I was able to point out that the cost of taking the deposition for the respondent was significantly higher than the cost would've been had they even just gone ahead and pulled the data off of these inaccessible data stores of the Lotus Notes server that was offline. That case, again, I really get to see what's going to happen on the other side of this because again, they folded and the case settled, but I was pretty confident, and I think they were, that we were going to win that argument. And they must not have liked what was on those emails. So in overcoming objections, the reasonable party wins. And if you start your RFP out with a giant sweep of unfilled, unfocused nets, you're going to look unreasonable. If you're very focused, I know you're going to risk potentially leaving stuff off the table, but you can always come back for an expansion if there's a reason. But if you're focused, at least each RFP is its own focused spear. You're going to look more reasonable.
You narrow the scope if it's not already narrow. Know what you're looking for and why. Don't go on a fishing expedition because it's going to be pretty obvious. And then there's going to be push from the other side to have you focus on search terms. And that is definitely something you have to consider. Search terms is necessary evil in this day and age, but you can get into all kinds of trouble with search terms. So you got to be very mindful about how search terms are done and it's an iterative process. If they're expecting you to come up with list of search terms and just throw over the fence and they're going to go pull that data, that's not a good way to do it. So you need to start thinking about the process where you're going to give them search terms, they're going to tell you what the results are, you guys are going to agree on changed search terms. This generally only works if the other side is cooperative.
If they're using search terms to search for stuff, they need to share them with you and you're going to have to probably sign off on them. So search terms can be a good thing but also really dangerous if you're not careful about how you structure things. I had one case, it was an employment case and the issue was involving a person with visual impairment who was suing the company claiming that based on discrimination. We had to really think out of the box because we didn't have machine learning at that time that could have even filtered for this stuff, but nobody's going to call him blind. So what are the other ways that one might reference to visual impairment? So we had to come up with some creative things to look for to see if any communications in the emails matched that. So also when you're dealing with objections, you got to be ready to argue cost. Usually the respondent comes with a vague claim that the costs are going to be too high and doesn't really do a good job of supporting it.
In federal rules, generally the courts are going to force them to make specifications in their objection because they sometimes will be deemed waived if they just make a vague general objection. But if you come ready to fight with some actual documents, some actual evidence around arguing costs like we did in that case involving pulling the data off of the Lotus Notes servers, then you're going to be able to have an objective standard and measure to show what the actual costs are going to be. So if you can come with some provable objective measures of what costs are likely to be, even a range, you're going to be able to overcome their vague statements of costs. And if they start getting specific, that's actually really good because now we can start getting into the weeds and that's where I frequently am able to win because once they really get down to it, it's really not going to be that expensive if they do it right.
Or we can, again, narrow something and limit something so that the costs are going to be much more manageable. So assuming you have overcome their objections and they've gone and done their production and now they've handed you the loadfile which you've downloaded from the online shared link, you're going to have to take a look at it. So I spend a lot of my time looking at loadfiles just to see if they're put together properly. So I will usually download a copy of it myself and just store it locally and look at it. I'll look at the DAT file and I'll usually push that into an Excel spreadsheet. I can read the DAT file like the Matrix when you're reading all the green code; I've just trained myself to be able to see that, but it looks like a lot of noise. What I'm looking for is the fields. I want to see how many fields there are and I want to know what they are. And if I can see right away that they've only given me five fields, then we're going to have a bit of a conversation with the respondents.
The other thing I'm looking at is what do the image files look like? I can just sample everything and just see that the structure should be good. And then we take that and we hand it off to our review platform for ingesting. So the things we're going to look at; the metadata like I just said, I'm going to at that DAT file and make sure that the metadata is there. We're going to want to look at that in context too to make sure that the metadata actually makes sense. I just had a case again, where a bunch of metadata was missing. They were very inconsistent about what they provided. And when I brought up a complaint about it, they provided another loadfile or another DAT file to overlay on top of that one and my question is, why wasn't it there in the first set? If you had that information, why didn't you provide it? So look at the loadfile structures. We're going to want to make sure that there are tags that tell us what requests they're responding to. Hopefully that's there.
If not, we might have an issue with because it's definitely the file structure's not going to be the way it was natively. We're going to look for the privilege log and make sure that that makes sense and make sure there's enough information in that so that we can tell what was actually withheld. I'm going to look at the redaction files, the redacted files and the redaction log to see if there's any concerns about, did they get a little too redaction-happy with the, with the blackout tool and take off a bunch of stuff that should be available to us? So now that you've got the documents. You've got your documents set. We want to consider that there could be spoliation or something missing. I would presume sometimes that usually that information's there and they just didn't produce it, so in the quintessential case, the [inaudible 00:43:19] case, that was the big seat change in 2006 or 2005. But the main thing with that case was it was a discrimination claim by a financial broker who claimed discrimination and they requested a bunch of documents as a plaintiff.
The documents I got back were missing a number of emails that the plaintiff actually had copies of. It's go fish. And that's really the only way you're going to know something's missing is if you have something that necessarily should be in the data set that you requested and that file is missing. And then you can start making the argument that, okay, there's gaps here. Why is that? And you can start taking depositions. You can depose the IT professional, maybe the CIO or the director of IT. And the good thing about that is that they're going to give you straight answers. Usually a technical operator, they can prep them as much as they want, but the questions are the questions and they take their job seriously and they're going to answer objectively to certain questions. So you're going to be able to tell how something was collected or whether it was missing and then come back and make the inference that it was intentional or just accidental and they lose. So in order to know that you've got something missing is to have a controlled data set.
And hopefully you have that. Hopefully your plaintiff or someone has something that you know should be in that data set. Maybe you can get a side witness or somebody to produce something to you or somebody to help you know that data set is complete. You can get a lot of mileage out of deposing IT and some key custodians. Usually it just takes that one deposition and I don't get to see much more because the case always settles. Mileage may vary, though. It may not happen all the time. And then you might need forensic analysis. If you can show that there is information missing, then maybe you can get the court to order access to their actual drives. So it's costly, but at that point hopefully you can get a judge so upset that all that information is missing and that they will order sanctions and shift the costs specifically to the respondent to produce and pay for the forensic analysis so that you can determine whether or not things are missing.
So I know that we are drinking from the fire hose. We only have an hour to go through this. And this is honestly, we could talk about this stuff for days. But at this level, I hope that you're more comfortable with and maybe you'll go through, I have a glossary attached, go through some of these key terms. If you come out of here knowing what's in a loadfile, what native files actually are, that the native files are the original files and then you're going to have in your production set from your loadfile, you're going to have the images and the text and the database that you're going to be able to review more quickly on your tool. That you understand that you've got to be specific about your RFPs and not just put a broad sweep out there, You got to be very tailored in how you craft your RFPs. I can't stress that enough. And do not accept paper or anything that was printed and then OCR'ed, with very few exceptions. If you are getting printed and OCR'ed stuff, you are being manipulated. There's just no way around it.
That is not optimal for running discovery. Now, some closing thoughts on eDiscovery. And this will wrap up our session. We are in the 21st century and eDiscovery is basically imperative. I think I've gotten that across. I just can't stress enough that at this point in time, not addressing eDiscovery and coming up with a plan is like not having a phone because you prefer teletype. eDiscovery can be the tail that wags a dog. I will come in where my clients have been banging their head against a problem and we start taking depositions or we start making some noise. And that case settles. What I can only infer is happening is that there was something pretty damning in a set of emails or sometimes their documentation, and they had successfully blocked the requesting party from getting it. And by pushing harder, they realized they were going to have to give it up and they wanted to settle before it came out. You got to develop a plan and the tools before a matter goes live. You just don't have the time to be learning a tool or fussing around with it with real data.
I know you got to set aside time. It's a pain in the neck, but you really should just take the lessons and take the tutorials with your tool. Pick a tool and get ready to use that tool when you have a live matter, even if you're not actively storing data on it yet. And then it's a really good idea to create templates for your Preservation notices and your RFPs with the ESI instructions. That's another thing, is once I get involved and I help affirm crafter their documentation for that, they don't really need me again for that part, although I do have a lot of repeat customers because every case is different. We're trying to go back through that specific case and figure out what the tailored RFPs, what's that laser focus request going to be, that you're going to get that smoking gun and just leave them no windows to get out of it. But it's good to have, you have these templates, you know your ESI instructions backwards and forwards, and you just attach them to the file.
So hopefully you've got a least a glimmering understanding of how eDiscovery is structured and how you can use it to your advantage and why you need to be using it. Thank you for your time. Take care.