Shanghai police-database breach exposes lax data protection

The latest leak serves as a reminder that Chinese authorities’ obsession with harvesting data on citizens undermines the security of personal information. It also limits data quality, which restricts automatic uses of surveillance data, say Antonia Hmaidi and Rebecca Arcesati.

The apparent leak of huge amounts of sensitive data from Shanghai’s police database shows the dangers of China’s hunger for data about its citizens. The hackers claim to have data on one billion people. After wiping the original database and posting a ransom note for 10 Bitcoin, at the time roughly the equivalent of 350,000 Euro, the hackers put their data haul up for sale in an online forum that serves as a marketplace for leaks like this. Incidents and personal details in data samples they released were found to be correct by journalists.

Leak reveals how poorly authorities in China look after public data

This is the latest and largest in a series of leaks from official databases, which have exposed, among other things, the personal information of two million Communist Party members in Shanghai and the facial data of thousands of Beijing residents. They reveal how poorly authorities look after the public data as they put state security above data protection. Under Xi Jinping, China has pushed the wide use of data and come to harness ever more of it for surveillance. Mass collection of personal data is key, the goal being to have predictive policing protect the Chinese Communist Party (CCP)’s regime from any perceived threats to its rule, from political dissidents to criminalized minorities such as the Uyghurs.

The Shanghai police database did not observe basic data-security and data-quality rules. It was accessible without any sort of credentials, and key software had not been updated regularly as it should be – its ElasticSearch database software dated from 2017. According to LeakIX - Host 101.89.99.234, the database was accessible freely for more than a year. On top of that, data quality and data integration are poor. For example, the data uses three different date formats, limiting its potential usefulness. Also, retrieving data from a specific time period would require someone to manually write code to retrieve this information.

Chinese efforts to harvest data for predictive policing are still in their infancy

In addition, the sample contains a lot of missing and obviously wrong data. For example, 76 percent of people appearing in it were not ascribed an ethnicity – and ethnicities that were recorded were not described consistently. There also seem to be errors with age information: only the 2-digit code for the year was coded, resulting in some dates of birth being matched to the early 1900. Unknown ages were coded as 00, making a lot of people appear to be aged over 100.

All this confirms previous evidence that China’s efforts to harvest citizen data for predictive policing are still in their infancy, with a focus on data quantity rather than quality. In fact, Beijing is trying hard to promote greater integration of surveillance data.

It is hard to see how Chinese citizens will continue to trust authorities with their data without clear consequences for insufficient protection. China’s obsession with amassing citizens’ personal information for surveillance undermines the formation of a culture of data protection and security. The 2021 Personal Information Protection Law (PIPL) was enacted mainly to address public concerns around privacy violations by commercial actors, but it also created new obligations for authorities. This includes the Ministry of Public Security (MPS), the body responsible for police in China, even though the law provides for broad exemptions for law enforcement and state security.

It seems unlikely that affected citizens can obtain redress

The central government’s response to the leak – or lack thereof – will show how serious it is about holding state bureaucracies accountable. When state organs violate the PIPL, article 68 states that higher ranking authorities or the departments responsible for data protection duties are meant to intervene. Also relevant is a provision in article 71 for situations where mishandling of personal data amounts to a “violation of public security management,” which in Chinese law brings with is administrative penalties and even compensation.

In the case of large-scale privacy infringements, article 70 gives citizens the opportunity to file a class action. Since 2015 illegal sale or provision of personal information by government personnel has been considered a criminal offense, although it is unclear whether the Shanghai data can be said to have been provided to third parties. While some heads may roll at the MPS, it seems unlikely that the affected citizens can obtain redress. Public reactions to the leak on the Chinese internet were promptly censored and authorities have yet to acknowledge the incident.

The continued leaks from big, government-controlled databases suggests that the Chinese Communist Party (CCP) for now seems unable or unwilling to shore up data security, even though with the Data Security Law it has made the issue a priority of legislative and regulatory action. Part of the problem is that China’s longtime prioritization of censorship and surveillance remains at odds with the high standards of digital security it wishes to achieve.

There appear to be plenty of vulnerabilities in official databases

China’s eye-catching string of data leaks also suggests some level of dissatisfaction with the CCP’s approach to data governance – the so-called Xinjiang Papers about Beijing’s mass internment of the Uyghur minority in the province, for example, were reportedly initially offered for publication by an official in the Chinese security apparatus.

While the data leak about Xinjiang appears to have been an inside job targeting properly secured data, there appear to be plenty of data-security vulnerabilities in official databases that can be exploited by those who have the technical background and are dissatisfied with CCP rule. A group calling itself CCP Unmasked leaked data from the Cyberspace Administration of China showing how it used “paid internet trolls” to censor coronavirus information. Hackers, some of whom anti-CCP, drive a thriving black market for official data in China. In 2014, hackers hacked into a TV-station in Wenzhou and displayed anti-CCP messages on air.

Author(s)