In 2002, Donald Rumsfeld, the US Secretary of Defense at the time, stated as part of a news briefing:
“We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns — the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones”
— Donald Rumsfeld, US Secretary of Defense, 2002
What is described here is dark data, or in other words information that is hidden, either knowingly or unknowingly. Dark data usually represents a problem as it means that there is hidden information that will not be accounted for in decision-making processes. If we would have knowledge about such hidden information, our understanding and resulting actions likely would change.
Gartner defines Dark Data as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).
— Source: Gartner IT Glossary “Dark Data”, 2020.
Where is Dark Data coming from?
Big Data is exponentially growing to dimensions that are way beyond our human capabilities to process and analyze. Projections are that by 2025 there will be 175 Zettabytes of data — that is a lot of zeros.
Dark data is basically a consequence and result of big data.