<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>InaPlex Blog &#187; Data Quality</title>
	<atom:link href="http://blog.inaplex.com/category/data-quality/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.inaplex.com</link>
	<description>CRM Integration</description>
	<lastBuildDate>Sun, 05 Feb 2012 21:36:39 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.inaplex.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>InaPlex Blog &#187; Data Quality</title>
		<link>http://blog.inaplex.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.inaplex.com/osd.xml" title="InaPlex Blog" />
	<atom:link rel='hub' href='http://blog.inaplex.com/?pushpress=hub'/>
		<item>
		<title>Insights Presentation &#8211; Integration and Migration Projects</title>
		<link>http://blog.inaplex.com/2009/06/04/insights-presentation-integration-and-migration-projects/</link>
		<comments>http://blog.inaplex.com/2009/06/04/insights-presentation-integration-and-migration-projects/#comments</comments>
		<pubDate>Thu, 04 Jun 2009 21:01:07 +0000</pubDate>
		<dc:creator>inaplex</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Inaport]]></category>

		<guid isPermaLink="false">http://blog.inaplex.com/?p=78</guid>
		<description><![CDATA[We have had a number of requests for a copy of the presentation made at Sage Insights in Nashville, May 2009. The PowerPoint is here, and the PDF is here. The focus of the presentation was issues in the management of integration projects, and a suite of tools that assist. In particular, there was discussion of [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.inaplex.com&amp;blog=7061060&amp;post=78&amp;subd=inaplex&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>We have had a number of requests for a copy of the presentation made at Sage Insights in Nashville, May 2009.</p>
<p>The PowerPoint is <a href="http://inaplex.files.wordpress.com/2009/06/inaplex_insights2009_crm28.pptx">here</a>, and the PDF is <a href="http://inaplex.files.wordpress.com/2009/06/inaplex_insights2009_crm28.pdf">here</a>.</p>
<p>The focus of the presentation was issues in the management of integration projects, and a suite of tools that assist. In particular, there was discussion of version control mechanisms using the open source Subversion system as the main tool, with client UI provided by TortoiseSVN and server hosting provded by Assembla.</p>
<p>InaPlex strongly recommends using version control on projects because it provides a large measure of safety and control, and makes it much easier for groups towork together safely.</p>
<p>Inaport supports version control because profiles built are just XML files, which can be versioned safely.</p>
<p>If you have questions about using version control with projects, please do not hesitate to contact us.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/inaplex.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/inaplex.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/inaplex.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/inaplex.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/inaplex.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/inaplex.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/inaplex.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/inaplex.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/inaplex.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/inaplex.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/inaplex.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/inaplex.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/inaplex.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/inaplex.wordpress.com/78/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.inaplex.com&amp;blog=7061060&amp;post=78&amp;subd=inaplex&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.inaplex.com/2009/06/04/insights-presentation-integration-and-migration-projects/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/586002b536410e4020b4a1a259e3e871?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">inaplex</media:title>
		</media:content>
	</item>
		<item>
		<title>SQL for Data Quality</title>
		<link>http://blog.inaplex.com/2009/05/17/sql-for-data-quality/</link>
		<comments>http://blog.inaplex.com/2009/05/17/sql-for-data-quality/#comments</comments>
		<pubDate>Mon, 18 May 2009 04:35:40 +0000</pubDate>
		<dc:creator>inaplex</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://blog.inaplex.com/?p=47</guid>
		<description><![CDATA[I gave a workshop on managerial and technical challenges of integration projects at Sage Insights in Nashville last week. One of the areas I covered was some useful SQL queries for checking data integrity during a migration; the example used was from a Siebel to SageCRM migration project we are currently engaged in. I had [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.inaplex.com&amp;blog=7061060&amp;post=47&amp;subd=inaplex&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I gave a workshop on managerial and technical challenges of integration projects at Sage Insights in Nashville last week.</p>
<p>One of the areas I covered was some useful SQL queries for checking data integrity during a migration; the example used was from a Siebel to SageCRM migration project we are currently engaged in. I had a number of requests from audience members for details of the queries, so here they are.</p>
<p><span id="more-47"></span></p>
<p>An Inaport profile had been used to move Accounts and Contacts from the Siebel database to SageCRM. After the profile had been run, a simple sanity check was used:</p>
<pre style="padding-left:30px;">select count(*) from accounts
select count(*) from crm_db.dbo.company</pre>
<p>The first point to note is that we can do a cross database query &#8211; the second query above is against the SageCRM database, even though the query window is currently logged into the Siebel database.</p>
<p>The more important point is that the count of Accounts in Siebel is 16,538, but only 7,794 in SageCRM &#8211; a difference of 8,744, This is probably not correct, and needs investigation. The first step is to confirm the problem:</p>
<pre style="padding-left:30px;">select count(id) from accounts
  where id not in
  (select comp_siebelrowid
    from crm_db.dbo.company)</pre>
<p>This query is an example of using a sub-query &#8211; it pulls all the siebel ids from the SageCRM table, then finds which ids from the source Siebel table have not made it across. The result shows that 8,744 ids are not there, which correlates with the simple counts above.</p>
<p>The most likely problem s the query being used in the Inaport profile is not getting all the Accounts. The query used was:</p>
<pre style="padding-left:30px;">select * from accounts a inner join
  contacts c on a.id = c.acntid</pre>
<p>Doing a count on this returns 11,282 rows. Initially this does not look like it matches against the numbers above, but the explanation is simple &#8211; there are more than one contact per account, so the inner join is resulting on more than one row per account. A quick look at the Inaport log file for details on the Company table shows 7,794 inserts and 3,488 updates, which ties in with the numbers above.</p>
<p>The experienced reader will see the isuse by now; a <em>left outer join</em> should have been used instead of an <em>inner join</em>. An inner join will return only rows from the Accounts table that have a matching row in the Contacts table i.e. Accounts with Contacts.</p>
<p>A left outer join will return <em>all</em> rows from the Accounts table (even if the Account does not have a Contact), and any rows from Contacts that match. Running the following query shows the effect:</p>
<pre style="padding-left:30px;">select count(*) from accounts a
  left outer join contacts c
    on a.id = c.acntid</pre>
<p>This returns 20,026 rows, which looks a lot more reasonable. To be absolutely sure, the following query gives us a count of the number of distinct account ids in the query (notice that the sub-query needs to be named as a virtual table):</p>
<pre style="padding-left:30px;">select count(*) from
  (select distinct a.id from accounts a
    left outer join contacts c
      on a.id = c.acntid) sq</pre>
<p>This query retuns 16,538, which is the correct number of accounts. So we have now confirmed that a left outer join query will return the correct number of accounts; this means that there are some accounts with no contacts.</p>
<p>Now that we are checking data integrity, the obvious question is, are there any contacts that are not linked to accounts? The following query tells us:</p>
<pre style="padding-left:30px;">select count(*) from contacts
  where (acntid is null)
  or (acntid not in
        (select id from accounts))</pre>
<p>This shows us that there are 2099 contacts that are not linked to accounts &#8211; they will therefore not be picked up by any query that has a join on account, and will need to ba handled seperately.</p>
<p>A final test &#8211; do we have any duplicates in the account table? A query to check for duplicates on company  name is:</p>
<pre style="padding-left:30px;">select acntName, count(acntName)
  from accounts
  group by acntName
  having count(acntName) &gt; 1</pre>
<p>This shows us each account which has more than one record with the same name; it also shows us how many records there are for each name.</p>
<p>An important note on NULLs in records. When checking whether there are contacts no linked to accounts, you might try a query like:</p>
<pre style="padding-left:30px;">select count(id) from accounts
  where id not in
    (select acntId from contacts)</pre>
<p>However, this query actually returns no records, whereas we know it should really return 2099. The problem is the NULL values in the accountid field, and their integration with the IN predicate.</p>
<p>The expression &#8220;value IN(a, b, c)&#8221; can only return TRUE or FALSE. If the set of values being tested contains NULL, however, then the expression &#8220;value IN(a, b, NULL)&#8221; can only return TRUE or UNKNOWN; it cannot return FALSE. Therefore NOT IN() can only return NOT TRUE or NOT UNKNOWN, neither or which is TRUE. So the WHERE clause always return FALSE, and we count zero records.</p>
<p>If any of you have favourite SQL for data checking, or you see better ways of achieving the results shown here, please feel free to comment.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/inaplex.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/inaplex.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/inaplex.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/inaplex.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/inaplex.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/inaplex.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/inaplex.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/inaplex.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/inaplex.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/inaplex.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/inaplex.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/inaplex.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/inaplex.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/inaplex.wordpress.com/47/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.inaplex.com&amp;blog=7061060&amp;post=47&amp;subd=inaplex&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.inaplex.com/2009/05/17/sql-for-data-quality/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/586002b536410e4020b4a1a259e3e871?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">inaplex</media:title>
		</media:content>
	</item>
		<item>
		<title>So you think you have data quality challenges?</title>
		<link>http://blog.inaplex.com/2009/04/25/so-you-think-you-have-data-quality-challenges/</link>
		<comments>http://blog.inaplex.com/2009/04/25/so-you-think-you-have-data-quality-challenges/#comments</comments>
		<pubDate>Sat, 25 Apr 2009 18:05:49 +0000</pubDate>
		<dc:creator>inaplex</dc:creator>
				<category><![CDATA[Data Quality]]></category>

		<guid isPermaLink="false">http://blog.inaplex.com/?p=41</guid>
		<description><![CDATA[One of the issues faced by most CRM systems is handling (preferably avoiding) duplicates. Inaport provides a range of different matching techniques to assist in avoiding duplicates when importing data. This interesting article in the New York Times provides some insight into the difficulties faced in China. By some estimates, just 100 surnames cover 85% of [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.inaplex.com&amp;blog=7061060&amp;post=41&amp;subd=inaplex&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>One of the issues faced by most CRM systems is handling (preferably avoiding) duplicates. Inaport provides a range of different matching techniques to assist in avoiding duplicates when importing data.</p>
<p><a href="http://www.nytimes.com/2009/04/21/world/asia/21china.html?th&amp;emc=th">This</a> interesting article in the New York Times provides some insight into the difficulties faced in China. By some estimates, just 100 surnames cover 85% of China&#8217;s 1.3 billion citizens. By contrast, 70,000 surnames cover 90% of American citizen&#8217;s.</p>
<p>Chinese citizens try to overcome some of the potential for confusion by creative use of the extensive Chinese character set of 55,000 characters. Unfortunately, this runs directly counter to the government&#8217;s efforts to computerize and standardise, with a set of &#8220;only&#8221; 32,252 characters. An even more restricted list of 8,000 approved characters is to be issued later this year. This is leading to situations where people cannot get identity cards issued, because the characters used in their name are not available in the government systems.</p>
<p>Inaport supports Unicode, so in principle can be used for matching against the full Chinese character set. However, I have to confess that (so far) we have not had to put this to the test, even though Inaport is currently being used by customers in Chine (and Japan). Should this happen, I&#8217;ll update the post with the results.</p>
<p>Regards</p>
<p>David</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/inaplex.wordpress.com/41/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/inaplex.wordpress.com/41/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/inaplex.wordpress.com/41/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/inaplex.wordpress.com/41/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/inaplex.wordpress.com/41/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/inaplex.wordpress.com/41/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/inaplex.wordpress.com/41/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/inaplex.wordpress.com/41/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/inaplex.wordpress.com/41/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/inaplex.wordpress.com/41/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/inaplex.wordpress.com/41/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/inaplex.wordpress.com/41/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/inaplex.wordpress.com/41/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/inaplex.wordpress.com/41/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.inaplex.com&amp;blog=7061060&amp;post=41&amp;subd=inaplex&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.inaplex.com/2009/04/25/so-you-think-you-have-data-quality-challenges/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/586002b536410e4020b4a1a259e3e871?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">inaplex</media:title>
		</media:content>
	</item>
	</channel>
</rss>
