Managing Data Localization and Infrastructure Fragmentation
Data localization, data privacy, data regulation — there has been a lot of “data” in international headlines of late. While the the European Union’s GDPR was a watershed moment for data-related legislation, it is by no means the only example. As more and more policies regulating (and protecting) personal data are introduced around the world, one thing is clear: “The Era of Borderless Data is Ending.”
What does that mean for your organization and your engineers?
Data Governance, Data Regulation, and Data Localization
Before we dive into what data regulation laws mean for individual organizations (and how to cope with a continually shifting legislative landscape) let’s talk a little bit about the terminology and unifying ideas behind global data policies.
“Data regulation” is a broad set of local, national, and international policies and laws ensuring that processed data is shared and/or governed appropriately. Data regulation can include stipulations for “data residency” which refers to the physical or geographic location of an organization's data or information, “data localization” which requires data about a nation's citizens or residents to be collected, processed, and/or stored inside the country, and “data sovereignty” the idea that information is subject to the laws and governance structures within the nation where it is collected.
Often, data regulation falls under the larger umbrella of “data governance” but tips toward the political, rather than corporate, end of the spectrum. Some large enterprises may have their own internal policies about how customer data should be managed above and beyond political mandates, but generally speaking when we say “data regulation” we mean it in the legislative sense. Think General Data Protection Regulation (GDPR).
While the details of any policy (and its enforcement) may vary from state to state or country to country, data regulation laws are usually passed with a focus on consumer protection and privacy. This sometimes takes the form of clauses like the “right to be forgotten” or by codifying an individual’s right to access the information an organization collects about them.
“Data locality” can refer to two different concepts. In computer science, data locality can refer to the process of moving computational resources to data, rather than moving data to computational resources. In policy, data locality, data localization, and data residency all refer to the geographical location where data is stored and, consequently, the legal jurisdiction in which it can or cannot reside. Unsurprisingly there’s a lot of overlap between the two concepts, as data localization laws can call for the use of data locality solutions.
Current Data Regulations
While some have decried the “tyranny of GDPR compliance measures”, the number of countries directly emulating the EU with their own data privacy laws continues to grow. Data regulation enjoys bipartisan support from a majority of Americans, and a whopping 40% of Europeans don’t want to share any of their personal data with any private companies. In sum, data regulations are here to stay.
GDPR and the Cloud
While the GDPR isn’t the first data protection law, it is one of the most comprehensive and prolific. Many of the data regulations currently on the docket take a page from GDPR’s book, so it’s appropriate to examine how the GDPR handles data residency.
There’s a common misconception that any personally identifiable information (PII) gathered in the EU needs to stay in the EU. PII can leave the boundaries of the EU, however if PII data leaves the EU the people whose data has been collected need to be informed and allowed to opt out. Their data also needs to be tracked, secured, and protected by every organization who may process their data, and if any data is potentially disclosed then the impacted individuals need to be informed. The logistical complexity of requiring consent and a fully compliant information supply chain means moving PII out of the EU is inadvisable at best and, if the destination is the U.S., it can be outright illegal.
In a landmark Court of Justice of the EU (CJEU) ruling back in July 2020, it was determined that cloud services hosted in the U.S. are not compliant with the GDPR, meaning if American companies are processing or storing any data from European customers, that storage and processing must happen inside the borders of the EU or another country deemed to have an adequate level of data protection such as New Zealand, Japan, Switzerland, and Canada.
Former Hamburg Commissioner for Data Protection and Freedom of Information, Johannes Caspar, put it best, “Difficult times are looming for international data traffic.”
Following the EU’s Data Privacy Lead
The vast majority of countries already have some kind of data privacy legislation on the books, with many planning on expanding the protections those extant laws offer. With both legal precedent and general public approval, it seems inevitable that today’s policies represent the beginning of a larger push toward global data regulation.
“137 out of 194 countries had put in place legislation to secure the protection of data and privacy.”
- United Nations Conference on Trade and Development
Here are a few policies of note, many inspired by the GDPR.
Brazil: Lei Geral de Proteçao de Dados (LGPD)
Brazil’s LGPD is patterned directly after the GDPR in terms of scope and scale with provisions related to consent, data processing by third parties, and record keeping. While the penalties for noncompliance are less harsh than the GDPR, companies who are noncompliant may face multi-million dollar fines.
Like the GDPR, the LGPD applies to any legal entity that collects or processes personal data in Brazil, or if it uses that data to represent and sell goods and services within Brazil to its citizens. The organization doesn’t need to be based in Brazil; any activity within Brazil’s jurisdiction falls under the LGPD.
China: Personal Information Protection Law (PIPL)
PIPL is an expansion of China’s 2017 Cyber Security Law and strengthens its extra-territorial reach. Any processing of PII outside China triggers PIPL’s application when the purpose of the processing is to provide products or services to people within Chinese borders or when the processing is for analyzing or assessing the behaviors of people within Chinese borders.
Under PIPL, all companies that do business in China (even those with no physical presence within the country) must comply or be subject to legal penalties and fines.
India: Storage of Payment System Data
While India recently withdrew its Personal Data Protection Bill, its strict payment processing laws remain and continue to inspire increased regulations within Indian borders. The Reserve Bank of India requires that all payment system providers, banks, intermediaries, payment gateways, and third party vendors in the payments ecosystem who do payment processing outside of India return payment data to India no later than one business day or 24 hours from processing.
Additionally, payment data must be deleted after transfer and can only be stored in India.
Japan: Act on Protection of Personal Information (APPI)
Japan’s expansive data protection legislation is so similar to the GDPR that the European Commission and Japan have a “reciprocal adequacy” agreement to permit the sharing and transfer of personal data between the two regions.
Much like GDPR, the APPI applies to both foreign and domestic companies that process the data of Japanese citizens, regardless of where in the world they maintain a physical presence.
United States: California Consumer Privacy Act (CCPA)
The United States has no blanket data protection policy, however California’s CCPA represents some of the strictest state-level guidelines for the management of PII. Opt-out and disclosure provisions mirror parts of the GDPR, as does the act’s extra-territorial nature (liability may also apply to businesses in overseas countries who ship items into California).
Fifteen more states are considering CCPA-type bills in 2022.
Not all current global data regulations are as strict as the GDPR, but there is a clear pattern of requiring consumer data to be processed and stored in the country of origin. Following this trend, we can expect that data localization will create a challenge for anyone operating in more than one country or, in the U.S.’s case, more than one state.
Even if an organization does not intend to serve an international customer base, by allowing users from the EU, Brazil, Japan, India, China or any of the 137 countries with data protection laws to access your platform — you are subject to all applicable rules and regulations.
What Does Data Localization Mean for Your Application?
Due to the extra-territorial nature of most data privacy legislation, there’s no way to avoid grappling with compliance short of cutting off service for most of the global population. Unless that’s an acceptable loss for your organization (which may well be the case) you’ll likely need to deal with each regulation on the local, regional, national, and international level.
While ensuring data compliance includes hundreds of hours of work across departments — from marketing to finance — we’re going to focus on what laws like the GDPR mean for engineering teams.
But first let’s get this out of the way: we’re not lawyers, and the contents of this blog are purely informational and do not constitute legal advice. Seaplane is a great tool for helping your organization comply with data regulations (more on that later!), and we’re happy to share our experience as a global company contending with many of the regulations we are discussing here today, but you’ll definitely want to talk to appropriate legal counsel to ensure organizational compliance.
Engineering for Data Locality
Most GDPR-inspired laws functionally require that customer data is processed and stored in the region where that data originates. This is a bit of an oversimplification, but given the logistical lift of allowing customers to opt-out of their data being transferred, it’s the practical outcome. Largely speaking if an organization has EU traffic, that data is staying in the EU.
Knowing this, data compliance should start on most foundational level with region-specific infrastructure.
If building dedicated infrastructure within the borders of every legal jurisdiction in which your organization operates sounds like a major lift, that’s because it is a major lift. Supporting multiple regions is definitely possible, but going from supporting one or two regions and a failover to juggling potentially dozens of zones and regions around the world is no easy task. The expense of duplicating infrastructure combined with managing all the fiddly bits that make everything work — routing, load balancing, data stores, compute — results in a lot of resource drain. It also introduces more points of failure as every additional deployment requires monitoring and, in case of outages, rescheduling.
Over the course of our ongoing user research we came upon multiple examples of well meaning engineers digging themselves into a compliance hole that grew deeper and deeper with every update and expansion. Entire teams were stuck supporting a wide range of custom deployments, unable to develop new features while fretting about what new data regulation came next.
The future is, inevitably, fragmented.
Coping with Infrastructure Fragmentation
“Infrastructure fragmentation” sounds ominous, but the concept of having different infrastructure for different deployments isn’t a problem in and of itself. It’s okay to have some amount of fragmentation — the issue arises when that fragmentation is difficult or impossible to manage.
Option 1: Multi-region DIY
There are plenty of solutions to data locality that can and do work given enough time, money, and expertise. Every major cloud provider offers example architectures and supporting services for multi-region deployments, meaning you can almost certainly DIY a multi-region set up to deal with data compliance everywhere your company operates.
Infrastructure as Code (IaC) tools can be used to give app-level developers some control over their deployments and provisioning, empowering them to comply on an individual level. This empowerment also confers a heavy responsibility, as it requires app-level developers to be aware of data jurisdiction laws and react accordingly.
Adding additional regions is where it gets more complicated. Maintaining three, four, five regions or more can quickly become prohibitively expensive, and adding additional cloud providers to the mix only creates more chaos. What starts as a straightforward solution morphs into hundreds of lines of YAML and thousands of developer headaches. It can be done, but it can’t be done easily or for cheap, and almost certainly requires a platform team or some dedicated DevOps engineers to execute.
Ultimately, DIY is possible. However, the total operational cost of building and maintaining a solution entirely in-house should be taken into consideration. Large organizations with thousands of engineers are far more likely to find success than smaller or rapidly scaling organizations.
Option 2: Consultants
This solution is right in the title. While bringing in consulting firms can limit your organization’s ability to react to legislative changes in real time, it’s a good choice for companies who have money but lack manpower. There are hundreds of excellent consultants and agencies out in the world, plenty of which specialize in data compliance.
Admittedly, this can be an expensive choice that requires a large investment up front and won’t totally solve for fragmented infrastructure. Even the best engineered system built by experienced professionals will require spinning up new machines for every project. While a lot of this work can be automated, it will still include a lot of YAML and introduces the risk of failures every step of the way — failures which can be especially difficult to manage when the consultant is long gone.
If your scope is limited and you have the time you need to closely collaborate with an outside party, then consultants could be the right solution for you.
Option 3: Seaplane
You might have seen this one coming, but we’ll do our best to be unbiased!
Seaplane is a global control plane that uses the optimal combination of public clouds, bare metal providers, and edge resources to deliver applications when and where they are needed anywhere in the world. If your users are geographically spread across Europe and the U.S., your application will automatically deploy in each location and scale horizontally and vertically to support the load. If there is no traffic in any given region, Seaplane will spin your deployment down to zero ensuring you only pay for resources you actually use.
For your average team member, this means shipping is as simple as deploying on a single cloud zone. For your team leads, this means defining a set of business rules that will apply to all deployed applications — giving leadership the tools they need to quickly and easily comply with all current and future data regulations without requiring special infrastructure for each jurisdiction. This unobtrusively ensures compliance without increasing overhead, removing responsibility from individual developers who probably shouldn’t be contending with complex legal statues in the first place.
Much like the other options in this section, Seaplane comes with caveats. Most critically, your application need to be containerized to use Seaplane. Seaplane also does not (currently) integrate with on-prem resources, so while Seaplane could be a good option for a greenfield project, we’re not the best solution for organizations that are dedicated to using their own servers.
If you’re interested in learning more (or have a project in mind) please reach out to request access.
Data regulation isn’t going anywhere, but compliance doesn’t have to be so painful. By investing early in the right solution for your organization, you can avoid the worst of infrastructure fragmentation all while setting your engineering team up for more success (and less YAML) in the future.