While there are many approaches which come and go for importing Excel and CSV data files into SQL Server nothing is as easy as using OpenDataSource. I know there are many who use SSIS and its brethren but some things when dealing Excel combined with SSIS, like nvarchar, costs time and diminishes its ease of use. So let's have a quick go of importing data, which I do on a local version of SQL running on my workstation.
Here's what my sample data looks like in Excel:
EmailAddress |
FirstName |
LastName |
scottg@contoso.com |
Scott |
Guthrie |
steves@contoso.com |
Steve |
Sanderson |
aarons@contoso.com |
Aaron |
Skonnard |
fritzo@contoso.com |
Fritz |
Onion |
johns@contoso.com |
John |
Smith |
scotth@contoso.com |
Scott |
Hunter |
madsk@contoso.com |
Mads |
Kristensen |
howardd@contoso.com |
Howard |
Dierking |
elijahm@contoso.com |
Elijah |
Manor |
estebang@contoso.com |
Esteban |
Garcia |
shawnw@contoso.com |
Shawn |
Wildermuth |
peteb@contoso.com |
Pete |
Brown |
reisenberg@contoso.com |
Rob |
Eisenberg |
timh@contoso.com |
Tim |
Heuer |
So open a query window in SSMS and fire off the following query:
SELECT distinct * --[First Name], [Last Name], [EmailAddress]
FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',
'Data Source=C:\MSSQL\clients\INTLdist\Queries\LoadEandF\SampleEmailList.xls;Extended Properties=Excel 8.0')...[Sheet1$]
where [FirstName] is not null
You'll see I'm using the provider for Excel's (97-2003) older version that are still in xls binary format. Also note the location of the Data Source which is local to the SQL Server you are running.
While you'll see that this as a rather simplistic sample, imagine an Excel file with 20 or greater columns and you'll begin to appreciate the ability to see column names and the ability to get right to the data without having to use some wizard to move "from/to" a destination.
So you might begin to think of things you could use this for.
Say someone lands a huge (>10GB) text file (SampleHugeFile.csv) on your network and you have to take a look inside to see what was delivered. You're not going to use notepad or any other desktop client tool since most will eat your memory alive. Instead just use the following and you'll see what's in the file without incurring the memory penalty:
SELECT top 10 WorkRef, WorkKey
FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',
'Data Source=C:\MSSQL\clients\INTLdist\Queries\LoadEandF\;Extended Properties="Text;HDR=YES"')...SampleHugeList#csv
I know some may see this as extreme but if you only have a hammer then everything starts to look like a nail.
Now for the final piece, since most of the world is using later versions of Excel you'll need slightly different OPENDATASOURCE provider notation:
select top 10 *
FROM OPENDATASOURCE('Microsoft.ACE.OLEDB.12.0',
'Data Source=C:\MSSQL\clients\INTLdist\Queries\LoadEandF\SampleEmailListInLaterVersion.xlsx;Extended Properties=Excel 12.0')...[Sheet1$]
Note the change in providers when using the OPENDATASOURCE since you'll need to switch to this when using newer versions of Excel which support the xlsx file format.
Now start to mix it up and start using a CTE to manipulate the data like so:
;with CTE as
(
SELECT distinct * --[First Name], [Last Name], [EmailAddress]
FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',
'Data Source=C:\MSSQL\clients\INTLdist\Queries\LoadEandF\SampleEmailList.xls;Extended Properties=Excel 8.0')...[Sheet1$]
where [FirstName] is not null
)
SELECT
PARSENAME(REPLACE(EmailAddress,'@','.'),3) Email,
PARSENAME(REPLACE(EmailAddress,'@','.'),2) Domain,
PARSENAME(REPLACE(EmailAddress,'@','.'),1) DomainExt
FROM CTE
results in:
Email |
Domain |
DomainExt |
Anderson769 |
contoso |
com |
Hayes231 |
contoso |
com |
Russell637 |
contoso |
com |
Simmons791 |
contoso |
com |
Young91 |
contoso |
com |
aarons |
contoso |
com |
Gonzales286 |
contoso |
com |
SQL Server has many hidden gems that you need to see just once and then you'll understand how it can increase your productivity.
Tags: Excel, SQL Server, OpenDataSource