Saturday, February 25, 2012

Data Warehouse & OLAP interview questions

Hi, I have a tech interview coming up for a team lead position for the data
warehouse/OLAP team. I was wondering if some of you wouldn't mind posting
technical questions (anything related to SQL Server/Data Warehouse/OLAP) for
me to use as practice.
Thanks.Assuming it's more "tech lead" than "manager lead" (based on your email I
assume you're not in management)
Question: What are the benefits and risks of co-hosting analysis services
and sql server RDBMS on the same OS image?
"Terry" <spankme@.nospam.com> wrote in message
news:ODd8YwAwDHA.620@.TK2MSFTNGP10.phx.gbl...
quote:

> Hi, I have a tech interview coming up for a team lead position for the

data
quote:

> warehouse/OLAP team. I was wondering if some of you wouldn't mind posting
> technical questions (anything related to SQL Server/Data Warehouse/OLAP)

for
quote:

> me to use as practice.
> Thanks.
>
|||Hi Terry,
Here's one of my favorite questions, because it can go on for as long as
someone wants to talk.
Name some performance aspects of bulk loading and what can be done to
improve the process.
Here's what I listen for:
Hardware Issues like RAM issues, Networking issues, Hard Drive issues, RAID
levels, CPU speeds, Bus speeds
Database Level Issues like Table locking, Column Sizes and Data Typing,
Batch sizes
Architecture Issues like Where the transforms happen (on the server or a
distributed architecture to use other CPUs), Parallelism issues
Of course, the lists go on and on, but that's the general idea. Much can be
gained from asking a question like this, because one can determine, almost
immediately, the type of experience a candidate has. If they are DBAs that
have quite a bit of hardware experience, they'll stick to answering with
improvements that can be made in hardware. Same goes for database
improvements and architecture improvements. If they happen to remark on all
three subject areas, they're probably well rounded enough to be able to
think about all three aspects simultaneously and can probably handle a much
greater load than someone that sticks to one area, or one that answers...I
don't know.
HTH, good luck.
Regards,
Don R. Watters
Data Group Manager
PhotoWorks, Inc.
"Terry" <spankme@.nospam.com> wrote in message
news:ODd8YwAwDHA.620@.TK2MSFTNGP10.phx.gbl...
quote:

> Hi, I have a tech interview coming up for a team lead position for the

data
quote:

> warehouse/OLAP team. I was wondering if some of you wouldn't mind posting
> technical questions (anything related to SQL Server/Data Warehouse/OLAP)

for
quote:

> me to use as practice.
> Thanks.
>
|||I would rephrase the term "bulk loading" (which has a very specific context
in SQL Server) to "ETL Processes"
And you seem to imply parallelism is a side issue in terms of performance.
It is the core of any high performance ETL process.
Kevin Connell, MCDBA
----
The views expressed here are my own
and not of my employer.
----
"Don R. Watters" <DonW@.PhotoWorks.Com> wrote in message
news:#M3nzfbyDHA.1668@.TK2MSFTNGP10.phx.gbl...
quote:

> Hi Terry,
> Here's one of my favorite questions, because it can go on for as long as
> someone wants to talk.
> Name some performance aspects of bulk loading and what can be done to
> improve the process.
> Here's what I listen for:
> hardware Issues like RAM issues, Networking issues, Hard Drive issues,

RAID
quote:

> levels, CPU speeds, Bus speeds
> Database Level Issues like Table locking, Column Sizes and Data Typing,
> Batch sizes
> Architecture Issues like Where the transforms happen (on the server or a
> distributed architecture to use other CPUs), Parallelism issues
> Of course, the lists go on and on, but that's the general idea. Much can

be
quote:

> gained from asking a question like this, because one can determine, almost
> immediately, the type of experience a candidate has. If they are DBAs

that
quote:

> have quite a bit of hardware experience, they'll stick to answering with
> improvements that can be made in hardware. Same goes for database
> improvements and architecture improvements. If they happen to remark on

all
quote:

> three subject areas, they're probably well rounded enough to be able to
> think about all three aspects simultaneously and can probably handle a

much
quote:

> greater load than someone that sticks to one area, or one that answers...I
> don't know.
> HTH, good luck.
> --
> Regards,
> Don R. Watters
> Data Group Manager
> PhotoWorks, Inc.
>
> "Terry" <spankme@.nospam.com> wrote in message
> news:ODd8YwAwDHA.620@.TK2MSFTNGP10.phx.gbl...
> data
posting[QUOTE]
> for
>
|||Hey Kevin,
Isn't this a SQL Server Newsgroup? The question is specific to SQL
Server.
Parallelism is certainly an issue if you're using DTS. If you don't
specifically set up the package to use parallelism then your only choice is
to run multiple packages at once to get parallel processes running. If you
have abstracted the data load in such a way that a single package is used to
load multiple files dynamically, then parallelism needs to be considered in
order to get the package running mutliple threads within the same package.
There are different degrees of how and where parallelism can and should be
used. Ensuring that the query processor is setup correctly to handle
parallel processing of the data load is also a consideration. One can also
have multiple servers running to get the data loaded across federated
servers. This is also considered to be parallelism.
One more point, just bringing up parallelism means that the candidate
understands that it's something that needs to be considered. It doesn't
mean that it's not part of the process. Every one of the things that I gave
as examples are needed for high performance systems. You can't have high
performance with 256 MB of RAM, nor can you have a high performance system
with a 5400 RPM ATA drive with no cache. Sure, you can get the job done,
but it won't be very performant. The same goes for parallel processing. I
defintely was not implying that parallelism was a side issue.
Regards,
Don R. Watters
Data Group Manager
PhotoWorks, Inc.
"Kevin" <ReplyTo@.Newsgroups.only> wrote in message
news:O6B5gbmyDHA.2064@.TK2MSFTNGP10.phx.gbl...
quote:

> I would rephrase the term "bulk loading" (which has a very specific

context
quote:

> in SQL Server) to "ETL Processes"
> And you seem to imply parallelism is a side issue in terms of performance.
> It is the core of any high performance ETL process.
> --
> Kevin Connell, MCDBA
> ----
> The views expressed here are my own
> and not of my employer.
> ----
> "Don R. Watters" <DonW@.PhotoWorks.Com> wrote in message
> news:#M3nzfbyDHA.1668@.TK2MSFTNGP10.phx.gbl...
> RAID
can[QUOTE]
> be
almost[QUOTE]
> that
> all
> much
answers...I[QUOTE]
> posting
Warehouse/OLAP)[QUOTE]
>
|||Don
Your post is slightly confusing
Most people unserstand the word "Parallelism" to mean the process by which
SQL Server spawns another thread on another CPU to better execute a Query.
I believe the way you are using it here is for "In parallel". DTS has not a
right to do with "Parellelism" SQL Server will decide that / or not. For
instance no amount of DTS jiggery pokery will create "Parllelism" on a 1 CPU
box.
For Parallelism there were problems in 7 whereby a query using it would
execute far slower. I have also seen this in 2000 and therefore set (MAXDOP
1) in my queries but much less so than in 7.
mk:@.MSITStore:C:\Program%20Files\Microso
ft%20SQL%20Server\80\Tools\Books\arc
hitec.chm::/8_ar_sa_163x.htm
Please forgive me if I have misread your post.
--
Allan Mitchell MCSE,MCDBA, (Microsoft SQL Server MVP)
www.allisonmitchell.com - Expert SQL Server Consultancy.
www.SQLDTS.com - The site for all your DTS needs.
I support PASS - the definitive, global community
for SQL Server professionals - http://www.sqlpass.org
"Don R. Watters" <DonW@.PhotoWorks.Com> wrote in message
news:excxFymyDHA.3220@.tk2msftngp13.phx.gbl...
quote:

> Hey Kevin,
> Isn't this a SQL Server Newsgroup? The question is specific to SQL
> Server.
> Parallelism is certainly an issue if you're using DTS. If you don't
> specifically set up the package to use parallelism then your only choice

is
quote:

> to run multiple packages at once to get parallel processes running. If

you
quote:

> have abstracted the data load in such a way that a single package is used

to
quote:

> load multiple files dynamically, then parallelism needs to be considered

in
quote:

> order to get the package running mutliple threads within the same package.
> There are different degrees of how and where parallelism can and should be
> used. Ensuring that the query processor is setup correctly to handle
> parallel processing of the data load is also a consideration. One can

also
quote:

> have multiple servers running to get the data loaded across federated
> servers. This is also considered to be parallelism.
> One more point, just bringing up parallelism means that the candidate
> understands that it's something that needs to be considered. It doesn't
> mean that it's not part of the process. Every one of the things that I

gave
quote:

> as examples are needed for high performance systems. You can't have high
> performance with 256 MB of RAM, nor can you have a high performance system
> with a 5400 RPM ATA drive with no cache. Sure, you can get the job done,
> but it won't be very performant. The same goes for parallel processing.

I
quote:

> defintely was not implying that parallelism was a side issue.
> --
> Regards,
> Don R. Watters
> Data Group Manager
> PhotoWorks, Inc.
>
> "Kevin" <ReplyTo@.Newsgroups.only> wrote in message
> news:O6B5gbmyDHA.2064@.TK2MSFTNGP10.phx.gbl...
> context
performance.[QUOTE]
as[QUOTE]
Typing,[QUOTE]
a[QUOTE]
> can
> almost
with[QUOTE]
on[QUOTE]
to[QUOTE]
> answers...I
the[QUOTE]
> Warehouse/OLAP)
>
|||Hi Allan,
I don't think it's confusing if you read and understand the first post.
I think it's a matter of semantics. Nothing in my original post that was
questioned by a subsequent reader pointed to any flaw of SQL Server, whereby
that flaw had to be considered. My original post said that one thing that I
listen for in the interview is for the interviewee to bring up parallelism.
Regardless of if they bring it up as a matter of running processes in
parallel or if they being it up as query parallelism, to me they're one in
the same. I need to hear from the interviewee that they understand the
complexity of dealing with SMP.
Regards,
Don R. Watters
Data Group Manager
PhotoWorks, Inc.
"Allan Mitchell" <allan@.no-spam.sqldts.com> wrote in message
news:%23beEqVn6DHA.2432@.TK2MSFTNGP10.phx.gbl...
quote:

> Don
> Your post is slightly confusing
> Most people unserstand the word "Parallelism" to mean the process by which
> SQL Server spawns another thread on another CPU to better execute a Query.
> I believe the way you are using it here is for "In parallel". DTS has not

a
quote:

> right to do with "Parellelism" SQL Server will decide that / or not. For
> instance no amount of DTS jiggery pokery will create "Parllelism" on a 1

CPU
quote:

> box.
> For Parallelism there were problems in 7 whereby a query using it would
> execute far slower. I have also seen this in 2000 and therefore set

(MAXDOP
quote:

> 1) in my queries but much less so than in 7.
>

mk:@.MSITStore:C:\Program%20Files\Microso
ft%20SQL%20Server\80\Tools\Books\arc[QUO
TE]
> hitec.chm::/8_ar_sa_163x.htm
> Please forgive me if I have misread your post.
>
> --
> --
> Allan Mitchell MCSE,MCDBA, (Microsoft SQL Server MVP)
> www.allisonmitchell.com - Expert SQL Server Consultancy.
> www.SQLDTS.com - The site for all your DTS needs.
> I support PASS - the definitive, global community
> for SQL Server professionals - http://www.sqlpass.org
>
> "Don R. Watters" <DonW@.PhotoWorks.Com> wrote in message
> news:excxFymyDHA.3220@.tk2msftngp13.phx.gbl...
> is
> you
used
quote:

> to
> in
package.[QUOTE]
be[QUOTE]
> also
candidate[QUOTE]
> gave
high[QUOTE]
system[QUOTE]
done,[QUOTE]
> I
> performance.
long[QUOTE]
> as
to[QUOTE]
issues,[QUOTE]
> Typing,
or[QUOTE]
> a
Much[QUOTE]
DBAs[QUOTE]
> with
remark[QUOTE]
> on
> to
a[QUOTE]
> the
>
|||OK
Semantics it is then.
I interpret Parallelism and "In Parallel" to be two different things
especially when tied with DTS.
--
Allan Mitchell MCSE,MCDBA, (Microsoft SQL Server MVP)
www.allisonmitchell.com - Expert SQL Server Consultancy.
www.SQLDTS.com - The site for all your DTS needs.
I support PASS - the definitive, global community
for SQL Server professionals - http://www.sqlpass.org
"Don R. Watters" <DonW@.Photoworks.com> wrote in message
news:uSv6hrn6DHA.1428@.TK2MSFTNGP12.phx.gbl...
quote:

> Hi Allan,
> I don't think it's confusing if you read and understand the first

post.
quote:

> I think it's a matter of semantics. Nothing in my original post that was
> questioned by a subsequent reader pointed to any flaw of SQL Server,

whereby
quote:

> that flaw had to be considered. My original post said that one thing that

I
quote:

> listen for in the interview is for the interviewee to bring up

parallelism.
quote:

> Regardless of if they bring it up as a matter of running processes in
> parallel or if they being it up as query parallelism, to me they're one in
> the same. I need to hear from the interviewee that they understand the
> complexity of dealing with SMP.
> --
> Regards,
> Don R. Watters
> Data Group Manager
> PhotoWorks, Inc.
>
> "Allan Mitchell" <allan@.no-spam.sqldts.com> wrote in message
> news:%23beEqVn6DHA.2432@.TK2MSFTNGP10.phx.gbl...
which[QUOTE]
Query.[QUOTE]
not[QUOTE]
> a
For[QUOTE]
> CPU
> (MAXDOP
>

mk:@.MSITStore:C:\Program%20Files\Microso
ft%20SQL%20Server\80\Tools\Books\arc[QUO
TE]
SQL
quote:

don't[QUOTE]
choice[QUOTE]
If[QUOTE]
> used
considered[QUOTE]
> package.
should[QUOTE]
> be
> candidate
doesn't[QUOTE]
I[QUOTE]
> high
> system
> done,
processing.[QUOTE]
> long
> to
> issues,
server[QUOTE]
> or
> Much
determine,[QUOTE]
> DBAs
answering[QUOTE]
> remark
able[QUOTE]
handle[QUOTE]
> a
for[QUOTE]
mind[QUOTE]
>

No comments:

Post a Comment