Storing Bank File Conversion XML Output in SQL Server

The purpose of this post is to give users (or prospective users) an idea of the benefits of storing their bank data in SQL Server (2005 or greater) as an XML data type. In particular converting BAI and BTRS files from the bank format using the we developed called Bank File Conversion (BFC) App for Excel 2013 which can be found here. There is a Free Trial offer to allow you to thoroughly test the app for your purposes.

The BFC App for Excel 2013 securely converts your BAI or BTRS file into either Excel tables, CSV files or XML files. The XML files conform to a fully deterministic schema that is included with the App on the XSD tab. You can cut and paste it anywhere needed but the T-SQL to setup the schema collection is shown below. The schema can be seen in your browser by clicking here.

I run the BFC App for Excel once a month for each bank account I am responsible for and I use PowerPivot to analyze the data for float issues, unusual transactions and to run through some Pivot Charts. I also compare transaction code statistics to previous periods. I also save the monthly conversion output as an XML file in my Bank File Conversion database for querying should auditors or others require data extracts.

First we need to create a database which I called BankFileConversion. Next we need to create a schema collection. The schema collection is necessary so SQL Server knows how the XML is to be structured and is important for XQuery to work properly for data retrieval. If you store XML data in a column of your table as an XML datatype and would like to use XQuery to query the data down to the text node level (the values in the XML node text and not just XML fragments) you must have a fully deterministic schema added to the database’s schema collection. The database is added:

BankFileConversion database setup for tables to hold the XML datatype and other data.

BankFileConversion database setup for tables to hold the XML datatype and other data.

 

 

 

 

 

 

The T-SQL query to create the database as seen above and add the schema to the schema collection is as follows:

CREATE DATABASE BankFileConversions;
GO
USE BankFileConversions;
GO
CREATE XML SCHEMA COLLECTION BankFileConversionsSchemaCollection AS N'

... this portion of the schema omitted ....

';
GO

After running the above statements, we will want to add the table using T-SQL as follows:

CREATE TABLE [BankFileConversions].[dbo].[BankFile](ID int primary key,
BAI xml(BankFileConversionsSchemaCollection), BankMonth nvarchar(10),
BankYear smallint, RowAddedDate smalldatetime) 

As you can see, we created the table with an ID column and a column labeled BAI where we need to specify the XML data type and schema collection. We also included a column for the month the bank data covers, the year and the date this row was added to the table (which can be automated). Now we execute the T-SQL to insert a row into the table as follows:

INSERT INTO [BankFileConversions].[dbo].[BankFile] values(1,'
 <ns:Record type="01" seq="1" currency="">
 <ns:Fields>
 <ns:Field seq="1" title="RecordCode">01</ns:Field>
 <ns:Field seq="2" title="SenderID">121000248</ns:Field>
 <ns:Field seq="3" title="ReceiverID">FSATU014</ns:Field>
 <ns:Field seq="4" title="FileCreationDate">06/02/14</ns:Field>
 <ns:Field seq="5" title="FIleCreationTime">1559</ns:Field>
 <ns:Field seq="6" title="FileIDNum">01</ns:Field>
 <ns:Field seq="7" title="PhysRecLngth">080</ns:Field>
 <ns:Field seq="8" title="BlockSize"></ns:Field>
 <ns:Field seq="9" title="VersionNum">2</ns:Field>
 </ns:Fields>
 </ns:Record>
 <ns:Record type="02" seq="2" currency="">
 <ns:Fields>
 <ns:Field seq="1" title="RecordCode">02</ns:Field>
... this portion of the XML omitted ....

 <ns:Record type="49" seq="10565" currency="USD">
 <ns:Fields>
 <ns:Field seq="1" title="RecordCode">49</ns:Field>
 <ns:Field seq="2" title="AcctCtrlTotal">3964875.54</ns:Field>
 <ns:Field seq="3" title="NumOfRecords">106</ns:Field>
 </ns:Fields>
 </ns:Record>
 <ns:Record type="98" seq="10566" currency="USD">
 <ns:Fields>
 <ns:Field seq="1" title="RecordCode">98</ns:Field>
 <ns:Field seq="2" title="GroupCtrlTotal">9464569.79</ns:Field>
 <ns:Field seq="3" title="NumOfAccts">3</ns:Field>
 <ns:Field seq="4" title="NumOfRecs">462</ns:Field>
 </ns:Fields>
 </ns:Record>
 <ns:Record type="99" seq="10567" currency="">
 <ns:Fields>
 <ns:Field seq="1" title="RecordCode">99</ns:Field>
 <ns:Field seq="2" title="FileCtrlTotal">24198852765</ns:Field>
 <ns:Field seq="3" title="NumOfGroups">42</ns:Field>
 <ns:Field seq="4" title="NumOfRecs">10707</ns:Field>
 </ns:Fields>
 </ns:Record>
BaiFile>', 'May', '2014', '06/02/2014 09:16:00')

Now that we have the database created, the schema loaded, and the table created and loaded, we can now run a few queries. To do that we will not be querying rows of data in a SQL Server table but rather XML data in rows that is contained in XML datatype columns. That is, we will be searching in XML data structures saved in columns in rows of a table. Our table, dbo.BankFile happens to have only on row representing May of 2014 bank data.

XQuery is rather involved but you can get up to speed relatively quickly with the help of Jason Strate’s blog series here, some great books listed below and some other resources you can find by searching on “XQuery.” XQuery, to me, is a language that allows you to run XPath queries on the SQL Server XML datatype. If you are not familar with XPath I would read up on it at W3C Schools here.

Let’s just dive right in a take a look at a simple query that returns all of the Total elements for the month of May and year 2014:

WITH xmlnamespaces('http://www.bankfileconversions.com' as ns)
SELECT [BAI].query('/ns:BaiFile/ns:Record/ns:Totals/ns:Total')
FROM [BankFileConversions].[dbo].[BankFile] WHERE BankMonth='May'
and BankYear='2014'

The results of the query can be seen here in SQL Server Management Studio:

Total Element Query Results

Results returned from XQuery retrieving all Total elements – Click to Enlarge

The results show in a row as an XML fragment highlighted to indicate a link. When you click on the resulting XML fragment link a new tab will open up to display the XML like so:

XML Fragment Result of Totals XQuery

XML Result Fragment exploded on a new tab – Click to Enlarge

The XML fragment is approximately 3,000 lines and so it is not all shown here. A typical medium-sized business with multiple accounts will have as many lines. It is important to know that the query returned XML nodes from the column queried. This is not typically useful for those of us who use Excel. We need tabular data to add to our PowerPivot data model.

If however, you were simply running an ad hoc query to view in XML Notepad for example, this would be fine. Since the query result is a set of nodes though, you will need to create a root node (in this case named sample) to wrap the whole thing for XML Notepad to recognize it as valid XML (otherwise you will only see the first node because that node by itself is recognized as valid XML).

Totals Query Results in XML Notepad

Totals Query Results in XML Notepad – Click to Enlarge

So the question then is how do we get tabular data that we can add to our PowerPivot data model? We shred the XML using the CROSS APPLY command. This is complicated stuff where normally only dbas tread. I suggest though, that one can become reasonably proficient through experimenting and reading one of these very excellent books:

So the CROSS APPLY command has to do with making a table out of XML data and I won’t try to explain it any further because I am not qualified. However rather than retrieve one large XML fragment, the following XQuery statement retrieves a row for each row containing a single Total node; but still as a fragment:

WITH xmlnamespaces('http://www.bankfileconversions.com' as ns)
SELECT c.query('.')
FROM BankFileConversions.dbo.BankFile
CROSS APPLY BAI.nodes('/ns:BaiFile/ns:Record/ns:Totals/ns:Total') as t(c)

Notice two things; 1) the SELECT statement has no XPath statement and the CROSS APPLY statement now has one. To keep it simple I see this as the CROSS APPLY creates a set of, in this case, BAI.nodes using the XPath query passed in. Then my query method called on that set of nodes (c) says that c should just return itself (using the dot “.” operator). So I get the set of Total nodes with each node returned separately as follows:

Totals Query using CROSS APPLY

Totals Query using CROSS APPLY – Click to Enlarge

Now if I click on one of these XML fragments in the rows in the result set, I get a single Total node like so:

Single Total Node

Single Total Node – Click to Enlarge

Even though we have shredded the requested node set into individual nodes, the question still remains how do we get tabular data?

In order to do this we need to tell the SELECT statement to retrieve individual values from the XML. We want to see the Field text values in columns with a row for each Total node in the set requested. I’m really only interested in repeating type code (RepTypeCode), code description (CodeDesc) and the repeating amount (RepAmount) fields so the value() methods will request only those text node values. I changed the XPath query in the CROSS APPLY command so I have the full set of Record nodes to work with. This gives me access to the Record‘s sequence attribute value to include as an ID for the Record the Totals are from.

WITH xmlnamespaces('http://www.bankfileconversions.com' as ns) 
SELECT c.value('(@seq)[1]','INT') AS RecSeq,
 c.value('(ns:Totals/ns:Total/@seq)[1]','INT') AS Total1Seq,
 c.value('(ns:Totals/ns:Total/ns:Fields/
   ns:Field[@title="CodeDesc"]/text())[1]','nvarchar(max)')
     AS CodeDesc1,
 c.value('(ns:Totals/ns:Total/ns:Fields/
   ns:Field[@title="RepAmount"]/text())[1]','decimal(10,2)')
     AS Amount1,
 c.value('(ns:Totals/ns:Total/@seq)[2]','INT') AS Total2Seq,
 c.value('(ns:Totals/ns:Total/ns:Fields/
   ns:Field[@title="CodeDesc"]/text())[2]','nvarchar(max)')
     AS CodeDesc2,
 c.value('(ns:Totals/ns:Total/ns:Fields/
   ns:Field[@title="RepAmount"]/text())[2]','decimal(10,2)')
     AS Amount2,
 c.value('(ns:Totals/ns:Total/@seq)[3]','INT') AS Total3Seq,
 c.value('(ns:Totals/ns:Total/ns:Fields/
   ns:Field[@title="CodeDesc"]/text())[3]','nvarchar(max)')
     AS CodeDesc3,
 c.value('(ns:Totals/ns:Total/ns:Fields/
   ns:Field[@title="RepAmount"]/text())[3]','decimal(10,2)')
     AS Amount3,
 c.value('(ns:Totals/ns:Total/@seq)[4]','INT') AS Total4Seq,
 c.value('(ns:Totals/ns:Total/ns:Fields/
   ns:Field[@title="CodeDesc"]/text())[4]','nvarchar(max)')
     AS CodeDesc4,
 c.value('(ns:Totals/ns:Total/ns:Fields/
   ns:Field[@title="RepAmount"]/text())[4]','decimal(10,2)')
     AS Amount4,
 c.value('(ns:Totals/ns:Total/@seq)[5]','INT') AS Total5Seq,
 c.value('(ns:Totals/ns:Total/ns:Fields/
   ns:Field[@title="CodeDesc"]/text())[5]','nvarchar(max)')
     AS CodeDesc5,
 c.value('(ns:Totals/ns:Total/ns:Fields/
   ns:Field[@title="RepAmount"]/text())[5]','decimal(10,2)')
     AS Amount5,
 c.value('(ns:Totals/ns:Total/@seq)[6]','INT') AS Total6Seq,
 c.value('(ns:Totals/ns:Total/ns:Fields/
   ns:Field[@title="CodeDesc"]/text())[6]','nvarchar(max)')
     AS CodeDesc6,
 c.value('(ns:Totals/ns:Total/ns:Fields/
   ns:Field[@title="RepAmount"]/text())[6]','decimal(10,2)')
     AS Amount6
FROM BankFileConversions.dbo.BankFile
CROSS APPLY BAI.nodes('/ns:BaiFile/ns:Record[@type="03"]') as t(c)

The result set will be the field text values requested for the first 3 Total children of each Totals node in the set created by CROSS APPLY. There is a specific syntax to the value() method which includes an XPath statement that identifies a scalar value and the data type of that value. The SELECT statement will cycle through the set of nodes that are Records and their descendants (as specified in the CROSS APPLY XPath query). Then the XPath queries in the value() methods start the search path at the Record node level and move through to the Field descendant with the requested attribute value.  In order to get an individual value (a scalar) we need to specify an item number as in [1], [2] etc.. A [1] tells the XQuery to get the first value in the set returned by the XPath query in the value() method. So the results we get are as follows:

Tabular results from Totals XQuery

Tabular Results for XQuery – Click to Enlarge

There are 84 rows in all which matches the number of  03 records in the BAI file. None of the 03 records have more than 3 Total nodes which is why we see NULL in the columns related to a 4th, 5th or 6th Total.

We covered a lot material in this post so please put your questions or comments in the comments. I will post in the near future a whole list of XQuery statements that will work for you. As your understanding  grows, you can modify and test new approaches using the XQuery statements I supply.

Stay tuned.

About John Donnelly

Businessman, Economist, Developer, Writer
This entry was posted in Apps for Excel, Bank File Conversion, Working With the XML Data and tagged , , . Bookmark the permalink.