Nov 22 2015

Importing Excel/CSV to SQL Server -- OpenDataSource

Category: Features | SQL Server � Administrator @ 07:49

 

While there are many approaches which come and go for importing Excel and CSV data files into SQL Server nothing is as easy as using OpenDataSource.  I know there are many who use SSIS and its brethren but some things when dealing Excel combined with SSIS, like nvarchar, costs time and diminishes its ease of use.  So let's have a quick go of importing data, which I do on a local version of SQL running on my workstation.

Here's what my sample data looks like in Excel:

EmailAddress FirstName LastName
scottg@contoso.com Scott Guthrie
steves@contoso.com Steve Sanderson
aarons@contoso.com Aaron Skonnard
fritzo@contoso.com Fritz Onion
johns@contoso.com John Smith
scotth@contoso.com Scott Hunter
madsk@contoso.com Mads Kristensen
howardd@contoso.com Howard Dierking
elijahm@contoso.com Elijah Manor
estebang@contoso.com Esteban Garcia
shawnw@contoso.com Shawn Wildermuth
peteb@contoso.com Pete Brown
reisenberg@contoso.com Rob Eisenberg
timh@contoso.com Tim Heuer

 

So open a query window in SSMS and fire off the following query:

SELECT distinct * --[First Name], [Last Name], [EmailAddress]
FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',
'Data Source=C:\MSSQL\clients\INTLdist\Queries\LoadEandF\SampleEmailList.xls;Extended Properties=Excel 8.0')...[Sheet1$]
where [FirstName] is not null

You'll see I'm using the provider for Excel's (97-2003) older version that are still in xls binary format. Also note the location of the Data Source which is local to the SQL Server you are running.

While you'll see that this as a rather simplistic sample, imagine an Excel file with 20 or greater columns and you'll begin to appreciate the ability to see column names and the ability to get right to the data without having to use some wizard to move "from/to" a destination.

So you might begin to think of things you could use this for.

Say someone lands a huge (>10GB) text file (SampleHugeFile.csv) on your network and you have to take a look inside to see what was delivered.  You're not going to use notepad or any other desktop client tool since most will eat your memory alive.  Instead just use the following and you'll see what's in the file without incurring the memory penalty:

SELECT  top 10 WorkRef, WorkKey
FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',
'Data Source=C:\MSSQL\clients\INTLdist\Queries\LoadEandF\;Extended Properties="Text;HDR=YES"')...SampleHugeList#csv

I know some may see this as extreme but if you only have a hammer then everything starts to look like a nail.

Now for the final piece, since most of the world is using later versions of Excel you'll need slightly different OPENDATASOURCE provider notation:

select top 10 *
FROM OPENDATASOURCE('Microsoft.ACE.OLEDB.12.0',
'Data Source=C:\MSSQL\clients\INTLdist\Queries\LoadEandF\SampleEmailListInLaterVersion.xlsx;Extended Properties=Excel 12.0')...[Sheet1$]

 Note the change in providers when using the OPENDATASOURCE since you'll need to switch to this when using newer versions of Excel which support the xlsx file format.

Now start to mix it up and start using a CTE to manipulate the data like so:

;with CTE as 
(
SELECT distinct * --[First Name], [Last Name], [EmailAddress]
FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',
'Data Source=C:\MSSQL\clients\INTLdist\Queries\LoadEandF\SampleEmailList.xls;Extended Properties=Excel 8.0')...[Sheet1$]
where [FirstName] is not null

)
	SELECT  
	   PARSENAME(REPLACE(EmailAddress,'@','.'),3) Email, 
       PARSENAME(REPLACE(EmailAddress,'@','.'),2) Domain, 
       PARSENAME(REPLACE(EmailAddress,'@','.'),1) DomainExt 
FROM CTE
 

results in:

Email Domain DomainExt
Anderson769 contoso com
Hayes231 contoso com
Russell637 contoso com
Simmons791 contoso com
Young91 contoso com
aarons contoso com
Gonzales286 contoso com

 

 SQL Server has many hidden gems that you need to see just once and then you'll understand how it can increase your productivity.



Tags: , ,

Jun 28 2014

Comparing String Values for Similarities

Category: C# | SQL Server � Administrator @ 12:41

Many moons ago I embarked on a proof-of-concept project to see if I could use SQL Server to perform a matching process.

It was successful but I still had a lingering suspicion that the underlying algorithm (Levenshtein) used to determine the sameness between human full names was less than optimal.  So what ensued was a period where I would look around the web, once in a while, to see if I missed something. 

I can't tell you how many algorithms I researched, converted to C# and tested.  Don't ask me why, but anyone who starts with Levenshtein will most likely never find this algorithm since so many others like Levenshtein vie for attention.

I finally found what I was looking for purely by accident:

Strike-A-Match: http://www.catalysoft.com/articles/StrikeAMatch.html

and kudos to the paste bin for the C# version: http://pastebin.com/EfcmR3Xx#

So now if you ever need to compare human full names whether they be in any order like:

  • last name, first name compared to first name, last name

Strike-A-Match will do the trick. Take a look at this comparison of a human name: "Jimi Hendrix" to "Hendrix Jimi"

Using Strike-A-Match will compute that these two are exactly equivalent.

Enough said.

I embedded this into A SQLCLR function and it works like a charm.

Levenshtein and all your brethren really don't get the job done when all you really want to do is compare for similarity.

The web has the brightest ideas but try to look in all the dark corners.

Tags: , ,

Feb 19 2013

SQL Server Async Stored Procedures

Category: SQL Server | Stored Procedure � Administrator @ 08:05

As the saying goes: It's always better to be late than never and that goes double for this technique for executing stored procedures. 

What I'm referring to is a post from Remus Rusan, who explains in detail how to be able to run stored procedures in an asynchronous fashion:

http://rusanu.com/2009/08/18/passing-parameters-to-a-background-procedure/

In the set of two articles, he explains how to take a stored procedure, even with parameters, and run it without holding a client connection.

This is extremely useful if you need the ability for users to run long running procedures. 

You may be asking yourself why not just create a SQL Agent job and run that from the a client which has the same effect since SQL Agent jobs run asynchronously.  Well that's true but this technique is more flexible since you can pass parameters from the client whereas in the SQL Agent jobs there is no facility for passing parameters directly to the job.

He includes a detailed script for installing his helper stored procedures and the procedure for handling the Service Broker.  Yes I said Service Broker.

Don't get overly concerned since to use the SQL Service Broker is quite easy in setup and operational usage.

I won't go into his solution since you've probably already read that but what I offer here is just some small modifications which make it a little bit nicer to use.

To get started all you need is to have the Service Broker enabled by the DBA:

ALTER DATABASE xxxxxx SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
ALTER DATABASE xxxxxx SET ENABLE_BROKER;
ALTER DATABASE xxxxxx SET MULTI_USER;

 

Once this is done you're good to install all the Stored Procedures from Remus.

Check the DBA:

SELECT is_broker from sys.databases where name = 'xxxxxx';

 

One of the modification I made to the set of procedures was to be able to see the Start_Time of the running Stored Procedure.

In Remus's routines, all execution is within a Transaction which proves invaluable since you'll need the rollback capability in the case of failure.

But the Start_Time I don't think qualifies as a need for the rollback of the transaction plus I want to display the Start_Time to the users who requested the run of the procedure.

So all I did was include an execution of a CLR procedure to update the Start_Time.  Since CLR procedures execute outside the transaction it updates the Start_time immediately.

I made the change here in the AsyncExecActivated procedure:

 

begin try;
    receive top (1).......

if (@messageTypeName = N'DEFAULT'....

--CLR procedure
EXECUTE ExecuteSQLNoQuery

 

I also included a way to cancel a waiting procedure which is next in line to be handled by Service Broker execution.

It's crude but it allows you to stack up requests and act on the behalf of the user to cancel inadvertent requests.

Just update the error_message with anything (before it runs) for that task and it will effectively be cancelled.

select @error_message = error_message from dbo.tblTaskResults
      where [token] = @token;

--see if the request was canceled
if (@messageTypeName = N'Default' and @error_message is null)

 

I've put together an ASP.NET Page which allows users to initiate and cancel queued requests:

 

I've included my script for those who would like to build onto my recipe for Asynchronous Procedure Execution:

http://sdrv.ms/12r1d7C

I drive my whole web page and the Tasks available for execution based solely upon a SQL configuration table.

This technique frees the developer to add new tasks at anytime and no longer do you need to add more SQL Agent jobs.

Developers need time to drive into subject matter.  Heads down developing is a trap.

Tags: , , ,

Sep 30 2012

SSIS Package Execution with C# -- SQL Server

Category: C# | SQL Server � Administrator @ 08:16

There are times when it seems like IT management decisions are arbitrary and capricious.  This is one of them!

As with most shops, we have SQL Agent running the SSIS production packages but when we migrate to newer servers we are now instructed to no longer use the SQL Agent for execution on the SQL Server. 

OK -- so what's the substitute? 

I'm told to use Bat files which are executed through some other agent tool or use xp_cmdshell.

I feel like I'm going backwards in time and not moving forward with how an operational environment should be architected.  But thankfully some clever individuals have already paved the path to a more beautiful world.

And that world is C# execution of the packages.  You may be asking what the heck I'm talking about but SSIS packages can run anywhere.  I know this sounds unconventional but you can execute packages as long as you have the dependent dlls and the appropriate .NET framework in that environment.  Take a read through this following blog entry and you'll quickly see how to get SSIS packages running from any Windows server and not just a Windows server running SQL server:

Running Packages from C#

The beauty of doing it this way is that you now can have any Windows server running a service which executes the packages and base that execution upon a database configuration. 

Benefits:

  • Configuration files:
    • Running the SSIS package through C# allows you to pass in "User Variables" to the packages.  So just read what you want from a SQL configuration table and pass it to the package.
    • That's right, no longer do you have to maintain config files in folders but move that maintenance to a configuration table in SQL server.
            pkgLocation = Path.Combine(pkgLocation, pkgName.Replace("\"", ""));
            DtsLogging mylogger = new DtsLogging();
            mylogger.Initialize(pkgName);
            Application app = new Application();

            //Package pkg = app.LoadPackage(pkgLocation, eventListener);
            Package pkg = app.LoadPackage(pkgLocation,null);

            pkg.Variables["User::DmatchDataSource"].Value = pkgDmatchDataSource;
            pkg.Variables["User::DmatchUserId"].Value = pkgDmatchUserId;
            pkg.Variables["User::DmatchPassword"].Value = pkgDmatchPassword;
  • Error handling:
    • Make a consistent approach to your applications for error logging.  An errors collection is exposed from the C# package execution so that you can keep all your application logging in one place.
    • No more looking at an application log for one thing and SQL Agent history for another event.

In my case I wanted to capture the rows being sent across the wire (OnPipelineRowsSent) to SQL Server so now that can be captured with the Logging enabled:

            pkg.LoggingOptions.EventFilterKind = DTSEventFilterKind.Inclusion;
            pkg.LoggingOptions.EventFilter = new string[] { "OnPipelineRowsSent" };

            DTSEventColumnFilter ecf = new DTSEventColumnFilter();
            ecf.MessageText = true;
            pkg.LoggingOptions.SetColumnFilter("OnPipelineRowsSent", ecf);
            pkg.LoggingMode = DTSLoggingMode.Enabled;
   
            DTSExecResult pkgResults = pkg.Execute(null,null,null,mylogger,null);

Here is the DTSLogging class:

internal class DtsLogging : IDTSLogging

  {
      public bool Enabled
      { get { return true; } 

      }

      ulong rowsprocessed = 0;
      Stopwatch stpWatch = new Stopwatch();
      string pkgName = "";

 

      public void Initialize(string Pkgname)

      {
          pkgName = Pkgname;
          stpWatch.Reset();
          stpWatch.Start();
      }

 

      public void Log(string eventName, string computerName, string operatorName, string sourceName, string sourceGuid, string executionGuid, string messageText, DateTime startTime, DateTime endTime, int dataCode, ref byte[] dataBytes)
      {
          switch (eventName)
          {
              case "OnPipelineRowsSent":
                  {
                      if (messageText == null)
                      {
                          break;
                      }
 
                      if (messageText.StartsWith("Rows were provided to a data flow component as input."))
                      {
                          string rowsText = messageText.Substring(messageText.LastIndexOf(' '));
                          ulong rowsSent = ulong.Parse(rowsText);

                          if (messageText.Contains(" OLE DB Source Output "))
                          {
                              LogRowProcessedInfo(rowsSent);
                          }
                      }
                  }
                  break;
          }
      }


      public bool[] GetFilterStatus(ref string[] eventNames)

      {
          //bool[] boolret = {};
          return new bool[] { };
      }


      void LogRowProcessedInfo(ulong rowsSent)
      {
          rowsprocessed += rowsSent;
          // Include further implementation for logging to db and text file.
          if (stpWatch.Elapsed.Minutes >= Convert.ToInt32( Config.Instance().EventMessageTimeInterval))
          {
              stpWatch.Reset();
              eventLogSimple.WriteEntry("Pkg: " + pkgName + ", PipelineRowsSent: " + rowsprocessed.ToString());
              stpWatch.Start();

          }         
      }

  }

In my case I made a Windows service to run the packages.  This way the C# code looks at a database to schedule when to execute a particular SSIS package.  The dtsx files are kept locally on the application server and the C# code loads them and runs them locally.  This ends up using the resources of the application server and there's no resource impact (my DBA loves this fact) felt on the SQL server.

 

Sometimes from miserable circumstances comes inspiration.

Tags: , , ,