Saturday, February 25, 2012
data warehouse truncate question
database, creates tables, then I have a package that loads the data into the
data warehouse. Now, I got the request in to only load the data and not drop
and create the database every time the package runs due to the package is
going to run every 5-10 minutes. Now I never created an SSIS package until
now, so how can I accomplish in only loading the tables every time its ran
instead of dropping and creating the database every time the package is ran?
so in a nutshell.
I need a package to load the data only if the data does not exsist and not
drop and create the database every time.
any suggestions on how this can be done?
Two methods come to mind.
1) Simple insert where not exists statements for each table. Can be VERY
inefficient, but easy to construct.
2) Build system that tracks the last-inserted key value for each table.
Reference the appropriate key in the where clause of each insert statement.
If you need to also track modified data, then it gets MUCH more complicated
because you need triggers on every base table to insert the modified row's
PKs into a refresh holding table which would also be reference for the
warehouse update statements.
TheSQLGuru
President
Indicium Resources, Inc.
"Mike" <igotyourdotnet@.gmail.com> wrote in message
news:%23%23HSmywaHHA.1300@.TK2MSFTNGP02.phx.gbl...
> have an SSIS project and I have a package that drops and creates the
> database, creates tables, then I have a package that loads the data into
> the
> data warehouse. Now, I got the request in to only load the data and not
> drop
> and create the database every time the package runs due to the package is
> going to run every 5-10 minutes. Now I never created an SSIS package until
> now, so how can I accomplish in only loading the tables every time its ran
> instead of dropping and creating the database every time the package is
> ran?
> so in a nutshell.
> I need a package to load the data only if the data does not exsist and not
> drop and create the database every time.
> any suggestions on how this can be done?
>
>
data warehouse truncate question
database, creates tables, then I have a package that loads the data into the
data warehouse. Now, I got the request in to only load the data and not drop
and create the database every time the package runs due to the package is
going to run every 5-10 minutes. Now I never created an SSIS package until
now, so how can I accomplish in only loading the tables every time its ran
instead of dropping and creating the database every time the package is ran?
so in a nutshell.
I need a package to load the data only if the data does not exsist and not
drop and create the database every time.
any suggestions on how this can be done?Two methods come to mind.
1) Simple insert where not exists statements for each table. Can be VERY
inefficient, but easy to construct.
2) Build system that tracks the last-inserted key value for each table.
Reference the appropriate key in the where clause of each insert statement.
If you need to also track modified data, then it gets MUCH more complicated
because you need triggers on every base table to insert the modified row's
PKs into a refresh holding table which would also be reference for the
warehouse update statements.
TheSQLGuru
President
Indicium Resources, Inc.
"Mike" <igotyourdotnet@.gmail.com> wrote in message
news:%23%23HSmywaHHA.1300@.TK2MSFTNGP02.phx.gbl...
> have an SSIS project and I have a package that drops and creates the
> database, creates tables, then I have a package that loads the data into
> the
> data warehouse. Now, I got the request in to only load the data and not
> drop
> and create the database every time the package runs due to the package is
> going to run every 5-10 minutes. Now I never created an SSIS package until
> now, so how can I accomplish in only loading the tables every time its ran
> instead of dropping and creating the database every time the package is
> ran?
> so in a nutshell.
> I need a package to load the data only if the data does not exsist and not
> drop and create the database every time.
> any suggestions on how this can be done?
>
>
Friday, February 24, 2012
Data Validation
Being relatively new to SSIS, I'm looking for advice, or a best practice, regarding data validation before extracting the data for a transformation.
One of my project's require that certain data be validated in staging tables before it is loaded. The validations include checking for null values, verifying that a field is populated with apropriate values etc... The entire batch of data (good records and bad records) may be rejected depending on the validations.
I have a couple of different thoughts on how this could be handled...
- Run a series of validation queries on the data before executing an SSIS package Run some kind of validation transformation (does one exist or should I write a custom transformation?) Place contraints on the target tables so that bad records error out on the load Something else... I could be missing the completely obvious
#3 doesn't seem to viable as the entire load may be rejected if some of the data is bad...
Any thoughts?
You could filter out the bad data very easily using a Conditional Split transform. Better still, you can pipe that bad data somewhere for later analysis.
Does that help?
-Jamie
|||I like both of thoses suggestions. With the conditional split transform, could I then update the original record with a status flag?|||Once you pipe the data off elsewhere from the Conditional Split you can do what you want with it so yes, you should be able to do this! You would use an OLE DB Command transform.
You may have a problem (i.e. blocking) when trying to update a table that you are selecting from in the source - but this can be easily alleviated by dropping the update dataset into a raw file and then doing the update in a seperate data-flow.
-Jamie
|||Thank you!
I was just testing an update using the OLE DB Command Transform. The raw file seems like good work around if the row is locked!
Data types question - Varbinary and timestamps
Greetings once again SSIS friends,
I have some source tables which contain timestamp fields (that's timestamp data type not datetime). My dimension table holds the maximum timestamp value as a varbinary(8).
I want my package to have a variable that holds that value but I don't know which data type to use for this. The reason for this is because I want to use that variable to then retrieve all records from my source table that have a timestamp value greater than the value stored in the variable.
Please advise on what data type is suitable.
Thanks for your help in advance.
Greetings once again guys,
I found the following article which gives a workable solution to my problem, but I am still interested to find out the equivalent of timestamp data type in SSIS.
http://solidqualitylearning.com/blogs/erik/archive/2005/12/09/1499.aspx
|||
DT_BYTES
binary, varbinary, timestamp
binary, varbinary, timestamp
BigBinary, VarBinary
RAW
RAW
You will want to use DT_BYTES. See this article on MSDN.
http://msdn2.microsoft.com/en-us/library/ms141036.aspx
Does this answer your question?
Sunday, February 19, 2012
Data type problem
Greetings SSIS friends,
I cast one of my fields from a decimal to an int in my data source (generated from a SQL command). But when I try and map that field to another one of my tables, i get an error saying that I can not cvonvert from decimal to Int32.
Is there any solution which doest require using the derived column or data conversion components?!
Your help would be appreciated.
Can you provide your SQL so that we can look at it? If you are casting to an integer from a decimal in the SQL, SSIS should only see an integer...|||
Hi Phil,
That's what I don't understand.
The SQL is pretty straightforward to be honest.
select
distinct
b.eventid as evt_eventid,
b.meetid as evt_meetid,
cast(h.longname as varchar) as evt_long_name,
cast(h.shortname as varchar) as evt_source_short_name,
b.expstartdate as evt_expstartdate,
b.expstarttime as evt_expstarttime,
b.eventnumber as evt_eventnumber,
b.expertgroup as evt_expertgroup,
b.currentrun as evt_currentrun,
b.enddate as evt_enddate,
b.endtime as evt_endtime,
b.eventstatus as evt_eventstatus,
f.etypeid evt_etypeid,
cast(g.longname as varchar) as evt_type_long_name,
cast(h.longname as varchar) as evt_name,
b.maxplaces as evt_maxplaces,
b.maxpayout as evt_specific_maxpayout,
f.maxpayout as evt_default_type_maxpayout,
b.offdate as evt_offdate,
b.offtime as evt_offtime,
cast(b.callitap as char(1)) as evt_callitap,
cast(isnull(b.maxstake, 0) as int) as evt_maxstake,
b.scoop6leg as evt_scoop6leg,
b.riskoff as evt_riskoff,
b.userofftime as evt_userofftime,
b.delayofftime as evt_delayofftime,
cast(b.suspendliability as int) as evt_suspendliability,
b.termsid as evt_specific_termsid,
i.termsid as evt_default_termsid,
cast(j.longname as varchar) as evt_default_terms,
cast(l.longname as varchar) as specific_evt_terms,
convert(varchar(20), convert(bigint, b.updatetimestamp)) as evt_updatetimestamp
from meetings as a
inner join events as b
on a.meetid = b.meetid
inner join names as c
on a.nameid = c.nameid
inner join EvenType as d
on a.etypeid = d.etypeid
inner join names as e
on d.nameid = e.nameid
inner join EvenType as f
on b.etypeid = f.etypeid
inner join names as g
on f.nameid = g.nameid
inner join names as h
on b.nameid = h.nameid
inner join terms as i
on f.termsid = i.termsid
inner join names as j
on i.nameid = j.nameid
left join terms as k
on b.termsid = k.termsid
left join names as l
on k.nameid = l.nameid
where
convert(varchar(20), convert(bigint, a.updatetimestamp)) > ?
or convert(varchar(20), convert(bigint, b.updatetimestamp)) > ?
order by b.meetid, b.eventid
In SSIS however, it still thinks the column is a decimal.
|||Did you previously have the decimal coming into the source and then later changed the SQL to cast it to an integer?In the source column mappings, delete the row that corresponds to the column you are working with, and re-map it. Does that help?|||
Hi Phil,
Yes you are right. I unmapped the column and then remaped it and reconfigured my other merge joins and it is now in the data type that I cast it in the SQL code.
Thanks for your help.
Data Type in Oracle is not translating in SSIS datatype
When I try to join the colunms it claims the data types do not match the only differnece is the percision.?
ORACLE
select CAST(group_id AS NUMBER(18)) group_id from blah blah table
SSIS data type = [DT_NUMERIC], Percision = 38
SQL Sever data source
SELECT
CAST(PRINCIPAL_ID as numeric) GROUP_ID,
SSIS data type = [DT_NUMERIC], Percision = 18The metadata of the columns being joined should have to match exactly. (same precision, same
scale)
Friday, February 17, 2012
Data type conversions
I am just starting to lean and use SSIS to get data from an AS400 DB2 database. (I also posted this question to the SSIS newsgroup).
It all works for columns that are just a data type of Character, however I am having a problem with the Numeric data type. I keep getting an error "column cannot be inserted because the conversion between types DT_NUMERIC and DT_R8 is not supported."
In SQL, what do I need to declare the column datatype as? I have tried several.
Thanks for the information.
try DT_NUMERIC - decimal, R8 -float
Thanks.