Example 1 (HR)
Consider this simple table: it contains 4 columns, organized into a name and address grouping. There is also some meaning in the order of the columns.
A team in HR may manage such information, and to them they would perhaps call this a staff data table.
Another characteristic of this spreadsheet is that any value – string or number – can be entered in any cell. Mistakes can easily be made, but entering values is easy and convenient, so corrections are simple also.
Contrast this with how an IT developer would look at this table. A traditional database requires a predefined structure in which the columns in a table are predetermined with a certain level of definition before data values can be saved.
This MySQL statement creates these same four columns and adds one more:
CREATE TABLE Persons (
PersonID bigint(20) NOT NULL AUTO_INCREMENT,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255),
PRIMARY KEY (`PersonID`),
);
Before any data is saved, a structure is predefined in the form of a data table with named columns, each column being fixed with a predefined data type and data type size. Best practice requires a unique identifier with every table entry, hence the additional personID column .
Once this table has been created, data values can then be inserted in the database. However if an attempt is made to insert a value that does not match type – for example saving a string when a number is expected – an error is thrown and the values are not saved.
Notice also that information is lost: the name and address grouping the spreadsheet had is not saved in the database. An application developer can easily recreate this when presenting back this data, but this highlights a critical point: the art of converting data to information is relinquished by the business team to the IT developer.
Example 2 (Finance)
Information used by a finance (or sales) team is being managed in this spreadsheet:
- Sales quantities in units (#) and dollars ($) are organized in separate columns by year and quarter.
- The table header has multiple rows to facilitate this organization.
- The quarterly sales ($) amounts are calculated by multiplying the sales quantity (#) by the price each.
- The $ quantities have been formatted to two decimal places.
So on one sheet, we have values captured and organized, values modeled (equations calculating price), and value presentation (e.g., formatting). Bold text and shading also aid in understanding.
Contrast this with how an equivalent IT application could be implemented:
- First, the three-row header would be removed. Columns in a database have one label and type.
- Creating new columns each quarter as new data is entered is not efficient, so the data needs to be unpivoted (or flattened), which adds rows as more data comes in each quarter. Adding data on a regular basis using new rows in a database is the norm, using new columns is not.
- Two tables are also needed: one for the fixed price and the other for the sales quantities.
So the tables as saved in a database would look this this (table 2 is truncated):
While the same data has seemingly been saved, Information has been lost in this process: how the table was organized, the calculations that were used, and the way the data was presented.
Again for an IT developer, recreating this presentation with software in an application would not be a problem. But they are re-created by an algorithm in whatever IT application is being created. As with the first example, the art of converting data to information is relinquished by the business team to the IT developer.
There’s also an issue with the original spreadsheet: The ‘Price Each’ column will actually vary over time. So a later update may be needed to add this as a quarterly column as well. Easy enough in a spreadsheet, a complete redesign for the database ‘schema’ and a change to the software for IT.
Takeaways
Spreadsheets allow for
- values to be entered into a table cell without restriction as default (although settings can restrict if needed later). Certainly no thought is given to the size that strings need to be nor if a cell type is numeric or text or something else, nor that table rows may or may not need a unique ID.
- a user to organize values by arranging columns a certain way and grouping columns as needed. Adding titles, column names, or formatting or adding validation rules can come later.
- models to be applied, through use of cells with equations
- values to be presented in some beautified tabular format, or with a chart
- changes to any of the above to be made quickly
Spreadsheets are an information management tool, not a data management tool. Business teams use information – not data – to run their critical day to day operations. This is why spreadsheets have been such a killer application for decades.
While IT tools have developed and grown powerful over the years, we believe that at their heart they are built on managing data, not information. There is a need for a new class of tool that from the ground up manages information, not data.
Such a tool would offer the same conveniences as a spreadsheet but be oriented around a database, not data files. Example behaviors include:
- Importing values without care for type
- Saving information tables that have columns organized: for example have multiple level header rows
- Provide grid-like responsive ease of use
- Enable information modeling without the need for programming, such as
- joining tables without type or uniqueness preconditions
- concatenating tables by auto-aligning columns
- adding custom columns to other data sources
- Saving information models
- Saving information views, tables, charts, dashboards
- Providing unique URLs enabling any item to be shared with users or other systems
- Fail gracefully, allowing users to fix data quality issues as they go
- Enable editing values from models or views
Our Oasis product provides such conveniences and more.