Analyzing Bank File Conversions XML Data Files using Power Query and PowerPivot

In a previous post, I showed how to use the Bank File Conversions App for Excel 2013 (BFC) to:

  • convert BAI Version 2 and BTRS files,
  • set the data in tables in Excel 2013,
  • add them to the Excel Data Model and,
  • work with them in PowerPivot.

Also, last week I demonstrated how to use SQL Server to store BFC XML output in XML data type columns. Storing the data in this way allows one to execute ad hoc queries (using XQuery) on the XML stored in the columns.

Bank File Conversions App for Excel offers several output options  for BAI or
BTRS data to facilitate analysis and storage. It is important to allow the storage of the data in a format that is easily queried. This goal is met by setting the conversion to output as 1) tables which then can be added to the Excel Data Model and, 2) in XML files. although CSV files are not so queryable (unless you are practiced with PowerShell), it is an essential form of input for many data visualization and analysis programs.

Today I want to talk about another option; using the XML file output from BFC directly as a data source for in PowerPivot. In other words, if one does not need to access multiple XML files at once, there is really no need to use a SQL Server to store the XML files. If your analysis is limited to one file at a time say, month by month or quarter by quarter, you can keep your XML files in a folder and use Power Query to ETL (extract, transform and load) the data into the Excel Data Model for use with PowerPivot.

This is not to say you can’t ETL more than one XML file at a time with Power Query but doing so might not be preferable to using SQL Server. It is also not to say that you may not simply ETL each file into the Excel Data Model and store there, adding new files to the data model as they come along and storing them all indefinitely in a Excel Workbook.

Eventually I will explore some more of these options but today I want to focus on the extraordinary opportunity Power Query affords us in the self-service BI realm. Power Query is the premier ETL tool for tabular data. Using Microsoft‘s newly developed “M” language, Power Query can extract from most data sources, transform the data in most any way and load it into the Excel Data Model.

There are a few references for what we are talking about today. In particular, here are some books you can read.

In order to extract, transform and load XML data from a file you need to click on the Power Query tab and, in the Get External Data section select From File and slide down and select the From XML like so:

From File Data Source Selection

From File Data Source Selection – Click to Enlarge

Once you click From XML a File dialog box will open and ask you to browse for, and select an XML file. When you do, a Query Editor window will open where you can then begin the transform process if there is any. Today I want to extract all deposit transactions on our account where distributed availability was recorded. That is, I want to review any deposit we were not given immediate credit for it in its entirety. Banks used to call this float and most still do; but the specifications indicate the use of the word availability now rather than float.

BFC calls any transaction record (type 16), where there is distributed date availability, a 16S record. And if there is a distributed time availability transaction BFC calls it a 16VT (variable time) record. In my line of work deposit transactions with timed availability are extremely rare so in this post we will limit our analysis to 16S records.

So let’s select From XML and browse to our file which, when we open, we get the PowerQuery Query Editor opening up for us to begin our transform. In order to get tabular data out of the XML file and in to the Excel Data Model, we need to have PowerQuery do some transforms. Here’s what you should see:

Power Query's Tabular Representation of the Root Node

Power Query’s Tabular Representation of the Root Node – Click to Enlarge

We see a tabular representation of the root node of the XML file. The root node (<File>) is not shown but its attributes are; as well as its first child node type, <Record>. There is only one type of child node in the BFC XML output which is the <Record> node. Another way to put this is that the <Record> node has no siblings. If there were additional child node types below the root (<File>) node we would see those as additional columns (one for each unique type) in the tabular representation. For reference here is the beginning of the XML file so you can see what PowerQuery is picking up:

Bank File Conversion XML Output

Bank File Conversion XML Output – Click to Enlarge

Now, to step down through the node types (Record, Fields, Totals, Total, Addenda, Batches, Tags etc) and view PowerQuery‘s tabular representation of each node type, you will need to click on the word Table (that the arrow is pointing to). Each instance of that node type will appear as a row in that table and there will be columns for each sibling node type as mentioned above. In plain english, you will see the following-

Power Query's Tabular Representation of <Record> node contents

First <Record> Tabular Representation of Node Contents – Click to Enlarge

which is all of the four (4) child node types of <Record> namely, <Fields>, <Totals>, <Batches> and <Addenda> as columns and every instance, subject to size limits (in this example 428 instances), of the <Record> node type as rows. You will also see each <Record> instances attribute values as well in the <Record> instance rows.

If you click on the Table link in any one of the <Record> instance rows in any child node column, you will see a representation of all the instances of that node. So if we click on the first Table in the <Fields> column we will see a table that is all the instances of <Fields> for that particular <Record> node. We see one row because, per the schema, <Record> can only have one <Fields> child node. Also, we see only one column because <Fields> has only one possible type of child node and that is <Field>.

The First <Record> Node's <Fields> Table

The First <Fields> Node’s Tabular Representation – Click to Enlarge

Now we have only one choice left and that is to see the Table that is the representation of this particular <Fields> node so we click on the Table link:

<Fields> Node Tabular Representation

The <Field> Node Tabular Representation – Click to Enlarge

So now we see the table representing the furthest descendant of the root node following the <Fields> axis off the <Root> node. We could have followed the <Totals> or <Batches>  or <Addenda> nodes in a similar fashion.

This exercise was to show you the significance of the Table structure and how it represents an XML file. Selecting a table in an instance of a node restricts you to the nodes data only and we can’t get at the text node values in this way.

Another thing to notice in the Query Editor is that each clicking of a Table link was recorded as an “Applied Step” in the Query Settings pane off to the right. This is a listing of the steps that you are executing to transform the data. Power Query saves those steps and incorporates them into the final query which you then save for future use. This is extremely powerful because the steps are written using a new language developed specially for this purpose by Microsoft called “M“. The language includes some wonderful functions and as your skills develop you can write entire ETL sequences without using the Query Editor.

In order to load the data we need into Power Pivot, we need to transform the XML file into a table of all the 16S (transactions with distributed availability) records including attribute values and all text node values. We couldn’t do this in the above exercise because we were basically following an axis by clicking on successive Table links. Now we will be expanding nodes in the following exercise to create this table to load into Power Pivot.

Starting at the same place as before (right after clicking on the From XML selection) we select the expand button next to the <Record> column header. I have drawn a red arrow  pointing at the button and I have highlighted it in yellow. 

The Expand Button

The Expand Button – Click to Enlarge

The Expand button essentially creates a table that has a row for each <Record> node  with a column for each child node or attribute you select to include. In this case when we click on the Expand button we see the following options presented:

Expand Button Selections

<Record> Expand Button Selections – Click to Expand

Before we address checking or unchecking elements we want to see in our expansion, we want to delete all the <File> node attributes (all the other columns other than <Record>) because we don’t need those in our result table. We just right-click on the column header and select Remove and we now see:

Selecting Record Elements to Include in Expansion

Selecting <Record> Elements to Include in Expansion – Click to Enlarge

In our case we are going to focus on <Record> node attributes and <Field> node attributes and text values. We will uncheck Totals, Batches and Addenda because we won’t be needing any data from those nodes. We do need the <Record> node attribute columns so we will keep them. Click OK and you should see:

Selecting Record Node Expansion Elements

Selecting <Record> Node Expansion Elements – Click to Enlarge

So now we have the first level of expansion of the <File> tabular representation which gives us a row for each <Record> node instance and a column for each attribute and child node type of the <Record> node instance. In this case there is only one child node available (<Fields>) because we unchecked the other possible child nodes to expand. We also now see that we have the <Record> attributes (record type, sequence number and currency code) that we want in our table.

First Expansion - <Record> Node Instances in Tabular Format

First Expansion – <Record> Node Instances in Tabular Format – Click to Enlarge

We still don’t see <Field> text nodes so we need to expand the <Fields> node. So click on the Expand button in the Record.Fields column header like so:

Record.Fields Expansion

Record.Fields Expansion – Click to Enlarge

There are no attributes on the <Fields> node in our XML file and the only possible child node type is <Field> so we need not uncheck anything. We simply click OK and we see:

Record.Fields.Field Instances with Record Attributes

Record.Fields.Field Instances with Record Attributes – Click to Enlarge

In order to get the text node element values we need to expand the <Field> node instances one more time. Click on the Expand button on the Record.Fields.Field header to see the elements to include in the expansion as follows:

Record.Fields.Field Expansion Selection Choices

Record.Fields.Field Expansion Selection Choices – Click to Enlarge

Notice the text node for each <Field> element is labeled Element:Text which we definitely want and we want both attributes as columns in our table as well. Click OK and we see:

Record.Fields.Field.Element:Text Expansion

Record.Fields.Field.Element:Text Instances Expansion With Attributes – Click to Enlarge

We are almost there but as you can see looking at the record type attribute (Record.Attribute:type) that we have all record types in our table and we are really only interested in distributed availability transaction (16S) records. In this case we simply filter the column using the filter down-arrow in the Record.Attribute:type column header like so:

Filtering the Record Type Attribute Column

Filtering the Record Type Attribute Column – Click to Enlarge

We uncheck all selections except 16S and click OK and we get the table we originally wanted as follows:

Data Table with 16S Records

Data Table with 16S Records – Click to Enlarge

We should name the query as something other than Query1. In the box that currently shows Query1 type in the name BFC 16S Record Query. I don’t like the column names however, nor do I like the order of the columns. So I will click on the column header of a few columns, drag them to a new position in the table and rename (right-click rename) them as follows:

  1. RecordType (Record.Attribute:type)
  2. RecordSeqNo (Record.Attribute:sequence)
  3. RecordCurrency (Record.Attribute:currency)
  4. FieldTitle (Record.Fields.Field.Attribute:title)
  5. FieldValue (Record.Fields.Field.Element:Text)

I have also deleted the Record.Fields.Field.Attribute:sequence element as it is not necessary for our analysis and it will interfere with the next step.

We have one more very important transformation that demonstrates the power of Power Query (living up to its name). The FieldTitle column needs to be pivoted. Meaning we need the FieldTitles to have their own columns with the FieldValues in the appropriate column. Otherwise PowerPivot Pivot Table analysis would be very difficult calling for the creation of numerous Calculated Columns and/or Fields. So we highlight the column we want to pivot (FieldTitles) and we click on Transform –> Pivot Column (in the Any Column area) and we will see:

Pivot Column Dialog Box

Pivot Column Dialog Box – Click to Enlarge

The default selections will not match the image above so make these changes 1) the values column needs to be FieldValue – this says put these values under the appropriately titled column corresponding to the FieldTitles value and 2) in the advance options select don’t aggregate because we want the detail, not counts of the detail. Click OK and it will transform the table to look like this:

Data Table with FieldTitles pivoted

Data Table with FieldTitles Pivoted – Click to Enlarge

Our table is done now and, in order to load the table into the Excel Data Model, we need to click on File –> Close and Load to and you will see the following:

Close and Load Dialog Box

Close and Load Dialog Box – Click to Enlarge

The default options are different from the ones shown so be careful. I typically don’t want the table set in Excel but rather in the Excel Data Model so I checked Only Create Connection and Add this data to the Data Model. If you only want to work with the data as a table in Excel you can just accept the default options selected. When you click the Load button you will see Power Query load the data from the XML File into the Excel Data Model like so:

Power Query Loading the Table into the Excel Data Model

Power Query Loading the Table into the Excel Data Model – Click to Enlarge

When it is done we will see our new query (BFC 16S Record Query) listed under Workbook Queries below the query already added to the book previously called Batches like so:

Power Query Loading Complete

Power Query Loading Complete – Click to Enlarge

To show that the data table is in fact loaded into the Excel Data Model click on POWERPIVOT –> Manage like so:

Select Manage to Switch to Excel Data Model

Select Manage to Switch to Excel Data Model – Click to Enlarge

And you will see the 16S Record table Power Query added to the Excel Data Model as follows:

16S Record Table Added to Excel Data Model

16S Record Table Added to Excel Data Model – Click to Enlarge

In order to analyze the data, click on the Pivot Table button in the center of the ribbon in the above image and Excel will prompt you for a location for the table like so:

Create Pivot Table Dialog Box

Create Pivot Table Dialog Box – Click to Enlarge

Click OK and the sheet should look like this:

Pivot Table Created

Pivot Table Created – Click to Enlarge

I will close the Worksheet Queries pane and build a quick Pivot Table to show the data like so:

Sample Pivot Table Using Data Table

Sample Pivot Table Using Data Table – Click to Enlarge

The Query Editor that we used in Power Query is a user interface to help construct the ETL steps in the M language. For those who would like to simply paste the code into the Advanced Editor (in the View tab) here it is:

let
Source = Xml.Tables(File.Contents("C:\Users\John\Desktop\BAIConversion\
   Output Files\BTRS MAX XML Only - XML File from BTRS-Quad Sample - 8.5 MB.xml")),
#"Removed Columns" = Table.RemoveColumns(Source,{"http://www.w3.org/2001/
   XMLSchema-instance", "Attribute:id", "Attribute:version",
   "Attribute:last_modified"}),
#"Expand Record" = Table.ExpandTableColumn(#"Removed Columns", "Record",
   {"Fields", "Attribute:type", "Attribute:sequence", "Attribute:currency"},
   {"Record.Fields", "Record.Attribute:type", "Record.Attribute:sequence",
   "Record.Attribute:currency"}),
#"Expand Record.Fields" = Table.ExpandTableColumn(#"Expand Record",
   "Record.Fields", {"Field"}, {"Record.Fields.Field"}),
#"Expand Record.Fields.Field" = Table.ExpandTableColumn(#"Expand Record.Fields",
   "Record.Fields.Field", {"Element:Text", "Attribute:sequence",
   "Attribute:title"}, {"Record.Fields.Field.Element:Text",
   "Record.Fields.Field.Attribute:sequence",
   "Record.Fields.Field.Attribute:title"}),
#"Filtered Rows" = Table.SelectRows(#"Expand Record.Fields.Field",
   each ([#"Record.Attribute:type"] = "16S")),
#"Renamed Columns" = Table.RenameColumns(#"Filtered Rows",
   {{"Record.Fields.Field.Element:Text", "FieldValue"},
   {"Record.Fields.Field.Attribute:sequence", "FieldSeqNo"},
   {"Record.Fields.Field.Attribute:title", "FieldTitle"},
   {"Record.Attribute:type", "RecordType"},
   {"Record.Attribute:sequence", "RecordSeqNo"},
   {"Record.Attribute:currency", "RecordCurrency"}}),
#"Reordered Columns" = Table.ReorderColumns(#"Renamed Columns",{"RecordType",
   "RecordSeqNo", "RecordCurrency", "FieldTitle", "FieldSeqNo", "FieldValue"}),
#"Removed Columns1" = Table.RemoveColumns(#"Reordered Columns",{"FieldSeqNo"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns1",
   List.Distinct(#"Removed Columns1"[FieldTitle]), "FieldTitle", "FieldValue")
in
#"Pivoted Column"

You will need to change the file path but if you use the XML Output from the conversion of a BTRS file with similar content, everything should be the same.

This was a very long and involved post. If you see any errors, inefficiencies or inconsistencies please comment.

About John Donnelly

Businessman, Economist, Developer, Writer
This entry was posted in Bank File Conversion, Working With the XML Data and tagged , , , , . Bookmark the permalink.