Do flat files have a place in modern cloud-based architectures?

As the world embraces cloud-based solutions, many concepts and tools need to be scrutinised for their viability in this new world.

A flat file is a means of data storage where singular entries are stored within a single row, on a single table, within a single file, where the file exists on its own and does not relate to external data. An example of this would be a CSV (Comma Separated Values) file, where each line describes a record. Flat files can be a very efficient means of storing data as there is little, if any, overhead involved with regards to metadata or relational data that you might find in other data storage methods such relational databases.

Flat files are most useful when data is read sequentially and each entry does not depend on other entries for context, each row standing on its own. This makes it a great candidate for reporting aggregated data or for the export and import of data between systems. However flat files struggle with random access reading, becoming more and more cumbersome, and less performant as more entries are added.

Due to the nature of flat files having each row be monolithic, it can be difficult to prevent duplication of data within its records, with information shared between entries needing to be stored individually on each entry. The advantage of this is that it guarantees that all of the information you might need to understand an entry will exist on this entry, but does mean that storage is inefficient, where file size may be a concern.

Additionally, this duplication of data can lead to an increased computational load when reading the data in the event of aggregated reporting. As each entry must be individually analysed and its uniqueness determined in order to be able to aggregate on this shared data.

The current state of flat files.

Flat files have fallen out of favour, with relational databases taking precedence in many cases, this is due to their improved performance in most modern-day use cases, this is due to most relational database systems using a manged database, using technologies like SQL Server, that allow for data to be managed in memory, as well as allowing for indexing of information. This makes for much faster read speeds when searching across data sets compared to flat files. This is further helped by improvements to the size and speed of RAM.

Another concern on flat files in the cloud is the lack of security within the file itself, anyone who is able to gain access to the file will have unfettered access to all of its contents. This is generally not an issue for files living inside of static storage but can be a great concern when the file is in transit. If a file is interrupted in transit, then the data breach can be extremely severe as there is no way to limit the scope of said breach.

Simply encrypting the file in its entirety would be a poor solution here as the encryption and decryption of the file in its entirety would quickly outweigh the convenience of the flat file’s quick sequential reading.

So, why use flat files over a relational database, or some other form of managed storage?

In modern cloud solutions, storage is cheap. Logs and auditing data are often lengthy and expensive data sets to manage inside of a managed data storage solution, and are rarely, if ever read from by the systems themselves. Exporting logs and auditing data into flat files to be stored within the blob storage of your cloud solution can save costs on your primary database.

However, this is with the condition that these files remain within the ecosystem of the cloud solution to avoid security risk, or that these files do not contain sensitive information.

As with most solutions, flat files do have their place, in the right scenario, depending on the needs of the system, however generally they can be very limiting with regards to computational workloads, due to the poor random-access performance and the lack of relational information, leading to a poor showing in statistical analysis or live processing of data.

A flat file’s benefit is generally in long term storage of data, and not in the storage of live or active data. It can be useful though in storing data for jobs running over a queue, where the expectation is that each row can be addressed individually in succession.

This can be useful in actions that run in a certain frequency, such as daily or monthly, to make calculations based on the individual events/entries within the file; for example, services that consolidate actions between different service centres might use flat files to process the sequence of those changes, such as systems like Youtube that aggregate usage data between centres, where view counts on videos need to be updated, but real-time information is less of a priority compared to the performance of the system not being slowed down by multiple concurrent read/writes on that individual field.