We finished Part One of this blog post with a simplified example to demonstrate how one might be able to measure document similarity across a given corpus using a document-term matrix as a starting point. Let’s now turn back to R and our larger sample … Continue reading
Tag: Relativity
Understanding Near-Duplicate Identification [Part One]
Near-duplicate identification is one of the more common textual analytics tools used in eDiscovery. Not to be confused with document deduplication, which relies on hash values, near-duplicate identification calculates document similarity based off textual content. For example, if you had … Continue reading
Basic eDiscovery Review Workflow [Part Two]
Part One of this blog post introduced this basic model workflow and discussed how to implement it in a Relativity review environment. Now, a little bit about the fields, why they’re important, and how we can use them throughout the review … Continue reading
