Sunday, March 11, 2012

Database Anomaly

Hi

I have a table call Employee, and got a field name call Employee Name

Employee Name

--, --

%%%,

(Gpsi) Dr Ac, Baruah

(Gpsi) Dr Ap, Dhital

0, 0

1 To 1, Centre

ABAYOMI, EMJ

4, US

5F4MBL01, 5F4MBL01

5Jxths01, 5Jxths01

A&E Dept(Rochdale, Zts)

NULL

ABAYASIRIWARDA, JM

Shaw, A

NULL

Zzzzzzzzzzzzzz, Aaaaaaaaaaaa

From this example data, I need to get only bolded records only, rest of the records are garbage records.

Eg: garbage Records like: “A&E Dept(Rochdale, Zts) “ has got a ‘&’ sign is not a valid employee name

“1 To 1, Centre” has got a ‘1’ is not a valid employee name

“%%%,” has got a ‘%sign is not a valid employee name

what is the best way to identify only valid emplyee names ?

Please help me to solve this problem

Sujithf

Sujithf:

Here is my first pass at this; I have quite a few questions:

-- --
-- Issues:
-- (1) What non alphabetic characters besides comma are allowed?
-- (2) I have assumed that a name must be formatted "lastName, firstName"
-- to be legal. Therefore, the name "Madonna" is filtered out.
-- (3) I am allowing the single quote, hyphen, period and comma characters
-- and all alphabetics. How does this list need to change?
-- (4) I set the "repeating letter" threshold at 3 repeated letters being
-- invalid.
-- (5) I put in a special condition for the 3 repeated letters condition
-- to allow "Fred Flintstone III" to pass. If this is not valid
-- this special condition test can be eliminated.
-- (6) If Roman numerals as part of the name aren't valid a test for this
-- will need to be put together.
-- (7) I have assumed that the name "Katie O'Neal" is valid.
-- (8) I have assumed that the name "Mike De La Hoz" is valid.
-- (9) I have assumed that the name [Jimmy "The Z" De Zego] is not valid.
-- --
set nocount on

declare @.names table
( employeeName varchar (30)
)
insert into @.names values ('- -, - -')
insert into @.names values ('%%%')
insert into @.names values ('(Gpsi) Dr Ac, Baruah')
insert into @.names values ('(Gpsi) Dr Ap, Dhital')
insert into @.names values ('0, 0')
insert into @.names values ('1 to 1, Centre')
insert into @.names values ('4, US')
insert into @.names values ('Abayomi, EMJ')
insert into @.names values ('5F4MBL01, 5F4MBL01')
insert into @.names values ('5Jxths01, 5Jxths01')
insert into @.names values (null)
insert into @.names values ('Abayasiriwarda, JM')
insert into @.names values (null)
insert into @.names values ('Shaw, A')
insert into @.names values ('Zzzzzzzzzzzzz, Aaaaaaaaaaaaa')
insert into @.names values ('A&E Dept(Rochdale, Zts)')

insert into @.names values ('Madonna')
insert into @.names values ('Thant, U')
insert into @.names values ('U, Betcha')
insert into @.names values ('U, U')
insert into @.names values ('Flintstone, Fred Jr')
insert into @.names values ('Flintstone, Fred III')
insert into @.names values ('Flintstone, Fred VVV')
insert into @.names values ('O''Neal, Katie')
insert into @.names values ('De La Hoz, Mike')
insert into @.names values ('Aardvark, Jimmy')
insert into @.names values ('Taylor-Curry, Ann')
insert into @.names values ('De Zego, Jimmy "The Z"')
insert into @.names values ('Suds, Dog &')
insert into @.names values ('Auto, Big 4')
insert into @.names values ('This, i''s, a-bad, name.')
--select * from @.names

select employeeName
from @.names
where employeeName like '[abcdefghijklmnopqrstuvwxyz]%, [abcdefghijklmnopqrstuvwxyz]%'
and employeeName not like '%[^abcdefghijklmnopqrstuvwxyz, ''-.]%'
and len (employeeName) = len (replace (employeeName, ',', '')) + 1
and employeeName not like '%aaa%'
and employeeName not like '%bbb%'
and employeeName not like '%ccc%'
and employeeName not like '%ddd%'
and employeeName not like '%eee%'
and employeeName not like '%fff%'
and employeeName not like '%ggg%'
and employeeName not like '%hhh%'
and employeeName not like '%iiii%'
and employeeName not like '%jjj%'
and employeeName not like '%kkk%'
and employeeName not like '%lll%'
and employeeName not like '%mmm%'
and employeeName not like '%nnn%'
and employeeName not like '%ooo%'
and employeeName not like '%ppp%'
and employeeName not like '%qqq%'
and employeeName not like '%rrr%'
and employeeName not like '%sss%'
and employeeName not like '%ttt%'
and employeeName not like '%uuu%'
and employeeName not like '%vvv%'
and employeeName not like '%www%'
and employeeName not like '%xxx%'
and employeeName not like '%yyy%'
and employeeName not like '%zzz%'


-- Sample Output:

-- employeeName
--
-- Abayomi, EMJ
-- Abayasiriwarda, JM
-- Shaw, A
-- Thant, U
-- U, Betcha
-- U, U
-- Flintstone, Fred Jr
-- Flintstone, Fred III
-- O'Neal, Katie
-- De La Hoz, Mike
-- Aardvark, Jimmy
-- Taylor-Curry, Ann

No comments:

Post a Comment