WTF is the difference between deterministic and probabilistic identity data?
By Kate Kaye
“Deterministic” and “probabilistic” identity data have become the new buzzwords in digital ad circles.
These terms have been familiar to digital advertisers, publishers and ad tech executives for years. But now that the entire industry is on the hunt for alternatives to the third-party cookie, they seem to be tossed around more frequently, especially in descriptions of how the new crop of so-called cookieless identifiers work.
Ad tech, of course, is riddled with made-up terminology. Not this time. Deterministic and probabilistic methods for making identifiable data connections have been around for years and in relation to a variety of subject areas that have absolutely nothing to do with digital advertising —from public health to education to risk analysis.
Better yet: the words actually reflect their meaning. (Even better yet — no acronyms!)
What is deterministic data?
Deterministic data is information that is known to be true and accurate because it is supplied by people directly or is personally identifiable, such as names or email addresses. It’s often referred to as authenticated data.
What is probabilistic data?
Probabilistic data is based on probabilities. It is comprised of individual pieces of information, such as a device’s operating system or IP address, and compiled to puzzle together a conclusion. In the case of ad tech, probabilistic data can be used to create an identifier.
How is deterministic data used for advertising identity?
Deterministic identifiers use deterministic data to assign identity to a person online or using a mobile device in order to track that identified person across websites or apps for ad targeting or measurement. The key ingredient in deterministic identity is typically information someone supplied herself, usually by logging in with a name, email address or phone number.
So, is deterministic data the same as first-party data?
Well, sometimes. First-party data gathered directly from people by a brand or publisher includes deterministic data such as names, emails or phone numbers. But first-party data also includes a variety of other information reflecting actions taken on a website, articles read, purchase transactions or other behavioral data.
So how is deterministic data used to assign identity?
Deterministic identity is achieved when an email address supplied by a publisher or advertiser is matched to the same email address in an identity graph or database of logged-in users. Or, a deterministic ID match could happen if two entities both recognize an ID and can accurately match them. Sometimes three pieces of deterministic information can be used to connect the dots. For example, if it’s known that ID1234 is firstname.lastname@example.org and email@example.com is ID6789, then ID1234 is a deterministic match to ID6789. Ultimately, to achieve a deterministic match, data fields must agree.
So what’s probabilistic data, and how is it used for advertising?
First, a bit on why probabilistic data is used. Deterministic data is hard to come by. Very often ad tech systems can’t match identities because someone is not logged in or an email address or other piece of deterministic data is not available. When advertisers complain about low …read more