Model Basics – Joining Tables

By way of example to show joins in Oasis, consider these two tables: one (Table A) has random data, but includes a US State column. The other table (State Info) contains details of each US State:

Note that the state table has header that contains two rows.

Joining tables means that we want to append row cells from one table onto the other based on a unique identifier. In this case we want to append state name, abbreviation and capital to the end of the rows of Table A, using the state name as the unique identifier.

To do this is Oasis, we select, in order, Table A, then State Info tables from the hierarchical list:

Of the options that show, we select Join Data. This results in a comprehensive popup:

Points to note:

  • At the top left are Left Table, Joining Elements, and Right Table areas. These contain the table column names. To make a join work, at least one pair of columns need to be matched. Here the State column from the left table is matched with the State column from the Right table.
  • At the top right is the Output Table. Here we select the columns we want to include in the output. Columns that start with ‘join.’ are from the right table indicating that they are being joined in.
  • Bottom left is a dropdown indicating join type. Left, right, inner and outer joins are available.
  • The Invert output checkbox reverses the logic of the join selected, so rows normally discarded are included and included rows are discarded.
  • The Collapse output checkbox is an option to use when identifiers are not unique. More on this later.
  • The table on the bottom right is a preview.
  • The fact that Table A has a one row header, and State Info a two row header is not an issue. The output table has a two row header.

Here we’ve created a model that performs all four join operations. The highlighted step (Join Data Right) has its output shown on the right.

A traditional prerequisite of joining data is that the unique identifier is unique in the right table. As the example above shows, the left table (Table A) seeks to join data from the right table (State Info), and the lookup requires the right table to only have one entry for that identifier. So ‘California’ can be used any number of time in rows in the left table to pull the supposedly unique ‘California’ data from the right table.

Oasis has no such prerequisite. This is in line with the information management philosophy integral to the platform. Part of information management includes data quality cleanup where issues with source data can be easily found and fixed. Therefore Oasis makes best effort to execute models against the data irrespective of quality issues, so enabling the output to be used to help highlight where issues are. This is why Oasis also enables data to be edited from model output and views.

To illustrate this we can reverse the order, so the left table is State Info and we join right data from Table A:

Points to note:

  • The left table State Info should have unique rows, but here we see duplication with California and Arizona for example
  • Comparing with the above right join, this table is the same except for the column order.

To stop duplication like this in the left table, the ‘Collapse output’ option can be used:

Now the rows for Arizona and California are not duplicated. Instead the values in the joined cells are collapsed into an array. Moving the mouse over the cell provides the array contents:

Oasis keeps track of where data comes from. We can right click on a cell to Edit data via the context menu that shows:

In this example the Edit Source Data popup shows five rows from Table A that could be edited. (Row 0, 5, 9, 13, 14 labels are selectable tabs)

This post has shown how easily tables can be joined and how data management restrictions in place in other tools do not exist in Oasis. It also has shown how data quality issues can be fixed easily when working with models.

Back to Exploring

trudain.com | All content copyright (c) 2022-2024