When it comes to pseudonymizing data, there’s one question that every data-driven organization has to ask.
How many insights do you need to sacrifice in order to meet data protection regulations?
With many pseudonymization or anonymization processes, the answer is a frustrating one: Protecting third-party data to the extent required by the law, the company’s guidelines, and the expectations of your customers often means making your data much less accessible and usable.
That’s why many companies are turning away from processes like encryption. Instead, data tokenization is becoming increasingly popular.
Why? Because tokenization, when done correctly, allows companies to carefully restrict access to sensitive data and comply with regulations, while still retaining the data’s usefulness.
It’s true that tokenization can be a powerful tool – if it’s approached carefully.
What is Tokenization?
Sensitive data, such as a customer’s Primary Account Number (PAN), is fed directly into a tokenization system, where the data is exchanged for a randomly generated alphanumeric string – this is your unique “token.”
These tokens usually retain part of the original data’s format or character, but there is no way of linking the token with the original data.
The token can’t be reverse-engineered unless it is re-entered into the tokenization system used to generate the original token.
This means that the original data never needs to enter your data lake. It’s only ever stored in the tokenization system. Access to this tokenization system can be password protected, making it much easier to limit access to unprotected data.
It’s this distinction that makes tokenization such an attractive process: Organizations can use the tokenized data for analysis, while carefully restricting access to the original data.
As a result, tokenization is perfect for organizations who wish to conduct extensive data analysis and make real time data-driven decisions, while still meeting regulatory requirements.
For example, tokenizing Electronic Protected Health Information (ePHI) can help healthcare organizations comply with HIPAA while allowing them to conduct wide-scale analysis to support research, treatment plans, and policy decisions.
“Card-based payments fraud will be lower this year because of the growing adoption of 3DS2 protocols and advances in card tokenization.”
Forrester
Data security and tokenization also tend to go hand in hand for financial organizations. In fact, the tokenization process actually originated from the Payment Card Industry Data Security Standard (PCI DSS), for use in payment processing.
Why Organizations Choose Tokenization
- Irreversible process
There’s no logical relationship between the token and the original data, which means tokens can’t be reverse-engineered without feeding them into the original tokenization system. - Simplified compliance
If the original data, the tokenization system, and the applications using the tokens are kept sufficiently separate, use of the data is no longer subject to some privacy compliance requirements. Unlike encrypted data, tokens cannot be reverted to their original form without the original tokenization system, so there is less concern about issues like data leaks. - Improved security
If you implement tokenization at the data source, then sensitive data never needs to enter your data lake in its original form. Those with access to your data lake have no way of accessing the original data. - Analysis and insight
Tokens can be mapped against specific customers or patients and reused, allowing your organizations to analyze repeat purchases and other patterns. - Cost-effectiveness
Tokenization only removes sensitive elements, rather than processing the entire data set, so fewer resources are needed. - Flexibility
Some of the original data values can be retained when a token is generated, making it easier to complete some processes without compromising security. For example, certain characters in a password can be retained to make it easier for customers to log in to a system or reset their password. Or a system can show the last four digits of a credit card during online transactions to help customers ensure they are paying with the right card. - Customer trust
Knowing that their sensitive data will never enter your data lake is reassuring for customers, and taking this extra step can be an excellent way of proving your commitment to protecting their information.
Tokenization Best Practices
Tokenization can be significantly more flexible and secure than other forms of anonymization or pseudonymization, but it does still have some vulnerabilities.
To mitigate these issues, you should follow tokenization best practices, including:
- Pre-token encryption
This method of protection is often used for protecting data in transit before it is tokenized. The encryption should be NIST-certified. - Risk assessment
There’s no single industry-standard form of tokenization. This means merchants need to assess processes such as making sure tokens are only sent over secure HTTPs. - Governance reviews
Access controls and policies should be regularly reviewed to ensure tokenization processes remain compliant with evolving legislation. - Scoping the environment
Tokenization can be applied wherever your sensitive data resides. So you will need to identify levels of distribution and locations such as systems, applications, platforms, and databases. Plus the languages used and the impact on interoperability. - Expiry/revoke strategies
Tokens remain valid until they are changed or retired, but using them for too long can increase the risk of a breach. Setting expiration policies can mitigate risks associated with tokens being used for longer than necessary.
How to Choose a Data Tokenization Tool
Naturally, the data tokenization solutions available are as varied as the sources of modern data flows. Some providers may focus purely on tokenization services. Others may add tokenizing as part of a wider payment offering.
But how can you choose between providers?
You should ask them a few critical questions to ensure they have your compliance completely covered:
- What, if any, limitations do you place on data tokenization?
Check for any limits on unstructured or non-business data records. - How do your tokenization processes comply with legislation?
Seek confirmation that tokenization takes place in regulatory-compliant locations, and find out whether it takes place in their hosted environment or if it can integrate with your infrastructure. - What access controls and permission tools are available?
Providers should offer easy NIST classifications for granular and real-time permissioning. They should allow you to set access thresholds that trigger alerts when there is potential leakage or when possible malicious behaviors are identified. - How are responsibilities divided between the provider and the customer?
Identify who’s responsible for monitoring, securing, and verifying data requests, and what resources each party is bringing to the table. - What schemes do you plan to support in the future?
Future-proof your investment by exploring plans for open banking, digital payments, and AI technologies.
“Mastercard prevented $30 billion in potential customer fraud losses through AI-powered systems.”
Mastercard
The Key to Truly Flexible Tokenization
On the surface, tokenization seems like an easy way to improve data access without sacrificing security.
But, if it is not approached correctly, tokenization can still suffer from many of the same pitfalls as other forms of anonymization or pseudonymization.
In many cases of tokenization, ‘‘access control’ still follows a fairly simplistic, monolithic approach. Whether access is controlled at the data lake or the data tokenization tool, it is a reasonably binary process: Either an individual always has access, or they are always denied access.
But complying with today’s data privacy and protection laws increasingly demands a more nuanced approach. It is no longer just a case of granting access to the right people; it is about ensuring the right people have access for the right purposes at the right time.
The best data protection systems aren’t role based, they are policy based. They account for everything from the location in which data is being accessed to the intent of the person accessing it and the length of time they will need to retain it.
To ensure that your organization is truly seeing the benefits of tokenization – the potent combination of security and flexibility – you must combine tokenization with a solution that can provide dynamic policy-based access controls.
A system – like the Velotix platform – that can account for evolving regulations, differing uses of data, multinational compliance issues, and changing roles – ensuring that the right person always has the right access at the right time. Using AI to dynamically control access across people, use cases, and countries.
It’s a powerful way for organizations to realize the full potential of tokenization: maximizing data value, while minimizing risk.