Skip to content

So you think you have data quality challenges?

One of the issues faced by most CRM systems is handling (preferably avoiding) duplicates. Inaport provides a range of different matching techniques to assist in avoiding duplicates when importing data.

This interesting article in the New York Times provides some insight into the difficulties faced in China. By some estimates, just 100 surnames cover 85% of China’s 1.3 billion citizens. By contrast, 70,000 surnames cover 90% of American citizen’s.

Chinese citizens try to overcome some of the potential for confusion by creative use of the extensive Chinese character set of 55,000 characters. Unfortunately, this runs directly counter to the government’s efforts to computerize and standardise, with a set of “only” 32,252 characters. An even more restricted list of 8,000 approved characters is to be issued later this year. This is leading to situations where people cannot get identity cards issued, because the characters used in their name are not available in the government systems.

Inaport supports Unicode, so in principle can be used for matching against the full Chinese character set. However, I have to confess that (so far) we have not had to put this to the test, even though Inaport is currently being used by customers in Chine (and Japan). Should this happen, I’ll update the post with the results.

Regards

David

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: