numbers. I do not have control over the contents of the lists The only
unique field between the two is the phone number. I need to be able to
inner join the two lists on phone number.
This would normally be straigt forward but the problem is that they are
formated different and one of them does't even have a control on the
formating.
*Phone numbers are US phone numbers only.
The one list (table) that does have a control uses this format
AAA3334444
where AAA is the area code
333 is the 3 digit prefix
4444 is the four digit suffix
the second list (table) does not have any standardized formating and
can be filled with extraneous spaces, parentheses and dashes not to
mention the leading 1 in some instances.
I thought that I could do some kind of regular expression to do a
comparison but I havn't as yet found a good resource to tell me how to
do it or if it is even possible. Or maybe break up the one I know has
a standardized format into something like this:
'%AAA%333%4444%' and doing my comparison that way. however It is very
important that only those list items in both list that are truly the
same place be listed.
suggestions and solutions are apreciated"Dan Gidman" <danatcofo@.gmail.com> wrote in message
news:1109353284.020286.122200@.g14g2000cwa.googlegr oups.com...
> Okay all I have a problem. I have two list of adresses and phone
> numbers. I do not have control over the contents of the lists The only
> unique field between the two is the phone number. I need to be able to
> inner join the two lists on phone number.
> This would normally be straigt forward but the problem is that they are
> formated different and one of them does't even have a control on the
> formating.
> *Phone numbers are US phone numbers only.
> The one list (table) that does have a control uses this format
> AAA3334444
> where AAA is the area code
> 333 is the 3 digit prefix
> 4444 is the four digit suffix
> the second list (table) does not have any standardized formating and
> can be filled with extraneous spaces, parentheses and dashes not to
> mention the leading 1 in some instances.
> I thought that I could do some kind of regular expression to do a
> comparison but I havn't as yet found a good resource to tell me how to
> do it or if it is even possible. Or maybe break up the one I know has
> a standardized format into something like this:
> '%AAA%333%4444%' and doing my comparison that way. however It is very
> important that only those list items in both list that are truly the
> same place be listed.
> suggestions and solutions are apreciated
It probably depends how bad the data is - if the number and type of the
unwanted characters is relatively predictable, then you can achieve quite a
lot with REPLACE():
create table #t (pnum varchar(20))
insert into #t select '1-800-456 7890 '
insert into #t select '(800)- 456 7890'
insert into #t select '1 800 456 7890'
insert into #t select '+1 800 (456) 7890 '
insert into #t select '(800) (456) 7890'
select pnum from #t
/* Strip out unwanted characters */
update #t set pnum = replace(pnum, ' ', '')
update #t set pnum = replace(pnum, '(', '')
update #t set pnum = replace(pnum, ')', '')
update #t set pnum = replace(pnum, '-', '')
update #t set pnum = replace(pnum, '+', '')
/* Remove leading 1 */
update #t set pnum = stuff(pnum, 1, 1, '')
where left(pnum,1) = '1'
and len(pnum) = 11
select pnum from #t
But if it gets much more complicated than that, I would consider using
bcp.exe to export the data, clean it up with an external script in a
language that has better string functions than TSQL, then load it again.
It is possible to use regexes in TSQL, by instantiating the VBScript Regex
object using the sp_OA% procedures, but it's rather clumsy and requires
sysadmin permissions, so it's probably not a good general solution.
Simon|||I have achieved some good success with nesting using the replace
function like you have suggested it. It occured to me after doing my
initial post.
replace(replace(replace(replace(number, '(',''), ')',''), '-',''), '
','')
this actually works very well. still not getting the results I need
however. I think that I next need to check for a 1 at the begining of
the string and remove it.
after that I think if I just grab the first 10 digits it will get
something I can compare against|||I think I found the answer to my own question. so far this little sql
statement is a wonder worker for cleaning up the data.
SELECT PhoneNumber = CASE WHEN SUBSTRING(temper.PhoneNumber,1,1) = '1'
THEN SUBSTRING(temper.PhoneNumber,2,11) ELSE
SUBSTRING(temper.PhoneNumber,1,10) END FROM (SELECT
Replace(Replace(Replace(Replace(Replace(LatestTWPr ops.PhoneNumber,'
',''),'-',''),'(',''),')',''),'/','') AS PhoneNumber FROM LatestTWProps
WHERE countryCode = 'US' AND PhoneNumber IS NOT NULL) AS temper
No comments:
Post a Comment