Digital Asset Management & eDiscovery: Finding What You Need
Posted by Andrew Forber on Tue, Feb 02, 2010 @ 11:00 AM
There was a time not so long ago when databases ruled the world.
When I say “databases” I’m referring to huge collections of data, all neatly slotted into long, wide tables in precise order, last name first, first name last. Remember the phone book, neatly arranged by correctly-spelled last name, first name, and initials and stamped onto dead tree matter? And there are military service records, and credit card account statements, all nice and precise and regular, every bit organized and in its proper place. There are rules, and math, and Cod’s Twelve Laws for Relational Databases. Those things still exist, of course, and a relational database is still a great and useful invention for keeping organized, regular, disciplined data.
Great, but. It’s really easy to lose data in a relational database. Oh, it’s in there, but you have to find it. Don’t know how to spell a name? One digit wrong in the account number? The description has a paragraph in it somewhere that talks about the CFO’s villa in Tuscany? Oops, no hits. You may be out of luck.
Over the years people have come up with ways to find data they couldn’t have otherwise. For example, Soundex codes were invented to find people in databases even though their names might have been misspelled: in addition to the name of the person you were looking for in the data, there could be a code consisting of a letter and three digits, and even if the person’s name was spelled wrong you would often still be able to find it because the code was correct. You’d just have to look at more records to find the one you wanted. (And we’re talking about really old databases here – filing cabinets, in fact. Soundex was first patented in 1919!!) Soundex, in a way, was the first tool for what we now call fuzzy search.
At the other end of the spectrum there’s a new technology ruling the world now: the chaos of the World Wide Web, and the search engines we all know and love. The data is messy, and meant for humans to read: it’s not useful to the average computer program. Finding the exact thing you want on the Web is often as much a challenge as it is in relational databases, because of the mass and messiness of the data, and the breadth of the fuzzy search tricks programmers have come up with. Looking for Maggie Smith or Asok Patel? Did you mean Ashok Patel? Here are the top 5 results for that. Be prepared to wade through a lot of data.
In fact it’s unavoidable that any time you try to create a tool that makes searching easier by requiring less precision, you pay for it by having more results to pore through. The biggest issue in web search is how to figure out what the human really wanted, and understand the contents of the database well enough to deliver just that. Oh, you’re looking for THAT Asok? The guy you went to school with? Let’s try to find him through the other organizations you’ve worked with, schools you went to, friends you had in common, companies that hire people with the same education … Computer scientists all over the world are racing to find solutions to the problem on the web. The problem will only be solved, perhaps, when the computers are truly more intelligent than we are.
In electronic discovery and in digital asset management we have a similar problem, with a smaller scope. Collections are smaller, perhaps up to just several million documents. We still need the full-text search and the tools for allowing searches to be fuzzy, but we need to pick out documents by their structured data at the same time. Find everything to do with the debt-hiding corporations, but only from the Chicago office in January 2003, and don’t let Fred over there see the ones marked Privileged. This kind of work requires a hybrid of the fuzzy and the precise: loose questions and strict rules, both, letting real people answer real questions in real-time. That’s a specialty of ours at MerlinOne, and our technology continues to develop so that our customers, over time, will have the best of both worlds.
Posted by Andrew Forber