You need to de-dupe the data before it goes over and its a task that nobody wants to do. SFDC has a de-dupe tool that you can access but it isnt brilliant.
The best way is through a data clean up project. I would export the data into Excel and run a data cleanse exercise get everything as you need it and upload it fresh. This is the best way to guarantee clean data unless you buy ringlead which is fantastic but not cheap.
It's definitely not a fun task, but one we believed would have what it took to accomplish what we're looking to do.
We've entertained ringlead, but also ran into the price tag issue. I appreciate your thoughts on this!