Life, the Universe and Everything
Before diving into the specific details of implementation and code structure, it is important to understand the intellectual context within which we are operating. A dictionary definition is always a good place to start, and according to the Cambridge Dictionary:
noun [ U, + sing/pl verb ]
US /ˈdeɪ.t̬ə, dæt̬.ə/
information, especially facts or numbers, collected to be examined and considered and used to help decision-making, or information in an electronic form that can be stored and used by a computer
This might be good place to start, however it is at once an oversimplification and unnecesarily prescriptive for our purposes. For a more detailed exploration we will start from a different place.
We will also consistently consider "data" to be a singular noun, which you are free to pronounce however you like.
Let's start with the universe.
This might not seem like the simplest place to start, but in theory we can just consider the universe as everything, right now. So it is a very simple definition which encompasses a lot of complexity.
Within the universe we have entities. In their simplest terms, entities can be thought of as anything (or any thing). People, places, objects, animals... in linguistic terms they are nouns.
Entities have relationships to other entities. These relationships can change over time, but since we are considering the universe to be everything, right now, we can ignore that for now.
Entities have state, which can be considered as characteristics of the entity. Some states are permanent and some are transitory.
Entities experience events. Stuff happens.
So this is our simple model of everything heppening in the universe right now.
We finally arrive at the first word of the dictionary definition: information. Information can be considered as a representation of all states, events and entity relationships in the universe. Whether it exists independently of its perception and/or observation is an interesting philosophical question. If a tree falls in the forest and nobody hears, does it make a sound?
Now we arrive at this critical juncture: data. Data can be considered to be a store of information. In the modern world, data is often asumed to be digital (i.e not physical), but in its essence we consider it as a physical store of information. Data has location, structure and context, which will be explored in more detail in the next section.
So what is the purpose of all of this? At a high level, the objective of any human or system interacting with data is to extract meaning from the data, in order to achieve something. Sometimes an insight is enough, however often this insight is being sought to inform decision making and take specific action.
Business Intelligence and dashboarding platforms are designed to plug directly into this data and enable humans to create charts and reports to attempt to glean insights from data, typically requiring human interpretation.
Modern machine learning systems are often designed to cut straight to the action part of the process, circumventing any need for human interaction, interpretation or intervention, which has significant associated benefits and risks.
The word metadata is often misused, so it is important to clearly define at this point too: metadata is data which describes characteristics of data, for example a timestamp of when the data was collected, or the type of data in each structure.
The term should not be used to refer to data related to an entity, state or event. Metadata contains information about the context, location, structure and lineage of data, but not its content.
Now this intellectual context is clearly defined, we can dive into the detail of how to think about data specifically, and how data transformation is such a critical step in the process of extracting meaning from the universe.