In today's data-driven world, businesses and organizations rely heavily on data to make informed decisions. Data comes in various forms, with one crucial distinction being whether it is private or public. In this case study, we will delve into the world of private and public data, including:
Private data, also known as proprietary or confidential data, refers to information that is not publicly available and must be protected due to privacy, security, or legal concerns. This category includes personal information such as social security numbers, financial records, medical history, and proprietary business data.
Public data, on the other hand, is information that is openly available to the public and does not require protection or authorization for access. This category encompasses a wide range of data types, including government publications, news and research papers, social media posts, and publicly disclosed statutory reports.
There are a few key differences between private and public data. For this comparison we have focused on both types of data in the world of business (instead of social security or any other type of personal data). Here are the top three differences in our opinion:
This is the biggest difference between private and public data. By its very nature public data is easily accessible and can often be obtained without restrictions. In contrast, private data is not freely available, and if you have it when you shouldn't then there is a risk that you could be in breach of the law. This difference in access rolls into the next key difference – Availability.
Restricted access to private data means that its availability is scarce. Meanwhile public data is freely available for anyone who wants to consume it. This drives the huge amount of available public data that is filling up the internet every day (and Large Language Models that underpin Artificial Intelligence today). This availability means that public data can be used for research, analysis, and decision-making without the need for explicit consent from anyone.
While public data is freely available the big limitation is accuracy. The two most common accuracy issues encountered are incompleteness or it simply being wrong (a.k.a. data errors). Therefore, when you use public data, you should always follow a validation process to gauge its accuracy. The most common method is to find a reliable source and compare data against that. But how do you find a reliable source? It usually comes down to skill, experience, and time.
In the next section we look at how CompanySights perform completeness checks to validate headcount benchmarking data sourced from public sources. We also show you how to calibrate public data sources to private data and demonstrate the high level of accuracy that can be achieved.
The best way to calibrate public data to private data (being the trusted source) is through trial and error. There is no perfect process and the approach differs depending on the data being calibrated.
The first step is to collate all sources of public data available then validate them against a reliable source. Continuing with our example of business data, we usually look to statutory accounts and company websites as the source of the truth because this data comes directly from the company themselves.
Comparatively, other sources such as news articles, research papers and social media networks are what we call indirect data (meaning that it doesn’t come from the company itself). These sources are usually less accurate and therefore need validation, such as comparing them to a more reliable source like statutory accounts.
The next step is to calibrate the validated public data to private data. We do this where private data is available to maintain a close degree of calibration between the two data types. Check out the employees by function for a European software company below, which required minimal calibration between the validated publicly available sources and the company’s own private data.
In today's data-driven landscape, distinguishing between private and public data is paramount. Understanding the differences and implementing appropriate safeguards for each type of data ensures responsible data consumption.
By calibrating the difference between private and public data, organizations can harness the power of public data to gain trusted insights. This will drive better decision making to ultimately help companies get ahead of their competition.
If you’re looking for calibrated headcount benchmarking data, try out CompanySights here
Try CompanySights to see how it works, for free.