Revealed: 50 million Facebook profiles harvested for Cambridge Analytica in major data breach

The data analytics firm that worked with Donald Trump’s election team and the winning Brexit campaign harvested millions of Facebook profiles of US voters, in one of the tech giant’s biggest ever data breaches, and used them to build a powerful software program to predict and influence choices at the ballot box.

A whistleblower has revealed to the Observer how Cambridge Analytica – a company owned by the hedge fund billionaire Robert Mercer, and headed at the time by Trump’s key adviser Steve Bannon – used personal information taken without authorisation in early 2014 to build a system that could profile individual US voters, in order to target them with personalised political advertisements.

Christopher Wylie, who worked with a Cambridge University academic to obtain the data, told the Observer: “We exploited Facebook to harvest millions of people’s profiles. And built models to exploit what we knew about them and target their inner demons. That was the basis the entire company was built on.”

Documents seen by the Observer, and confirmed by a Facebook statement, show that by late 2015 the company had found out that information had been harvested on an unprecedented scale. However, at the time it failed to alert users and took only limited steps to recover and secure the private information of more than 50 million individuals.

The New York Times is reporting that copies of the data harvested for Cambridge Analytica could still be found online; its reporting team had viewed some of the raw data.

The data was collected through an app called thisisyourdigitallife, built by academic Aleksandr Kogan, separately from his work at Cambridge University. Through his company Global Science Research (GSR), in collaboration with Cambridge Analytica, hundreds of thousands of users were paid to take a personality test and agreed to have their data collected for academic use.

However, the app also collected the information of the test-takers’ Facebook friends, leading to the accumulation of a data pool tens of millions-strong. Facebook’s “platform policy” allowed only collection of friends’ data to improve user experience in the app and barred it being sold on or used for advertising. The discovery of the unprecedented data harvesting, and the use to which it was put, raises urgent new questions about Facebook’s role in targeting voters in the US presidential election. It comes only weeks after indictments of 13 Russians by the special counsel Robert Mueller which stated they had used the platform to perpetrate “information warfare” against the US.

Cambridge Analytica and Facebook are one focus of an inquiry into data and politics by the British Information Commissioner’s Office. Separately, the Electoral Commission is also investigating what role Cambridge Analytica played in the EU referendum.

“We are investigating the circumstances in which Facebook data may have been illegally acquired and used,” said the information commissioner Elizabeth Denham. “It’s part of our ongoing investigation into the use of data analytics for political purposes which was launched to consider how political parties and campaigns, data analytics companies and social media platforms in the UK are using and analysing people’s personal information to micro-target voters.”

On Friday, four days after the Observer sought comment for this story, but more than two years after the data breach was first reported, Facebook announced that it was suspending Cambridge Analytica and Kogan from the platform, pending further information over misuse of data. Separately, Facebook’s external lawyers warned the Observer it was making “false and defamatory” allegations, and reserved Facebook’s legal position.

The revelations provoked widespread outrage. The Massachusetts Attorney General Maura Healey announced that the state would be launching an investigation. “Residents deserve answers immediately from Facebook and Cambridge Analytica,” she said on Twitter.

The Democratic senator Mark Warner said the harvesting of data on such a vast scale for political targeting underlined the need for Congress to improve controls. He has proposed an Honest Ads Act to regulate online political advertising the same way as television, radio and print. “This story is more evidence that the online political advertising market is essentially the Wild West. Whether it’s allowing Russians to purchase political ads, or extensive micro-targeting based on ill-gotten user data, it’s clear that, left unregulated, this market will continue to be prone to deception and lacking in transparency,” he said.

Last month both Facebook and the CEO of Cambridge Analytica, Alexander Nix, told a parliamentary inquiry on fake news: that the company did not have or use private Facebook data.

Simon Milner, Facebook’s UK policy director, when asked if Cambridge Analytica had Facebook data, told MPs: “They may have lots of data but it will not be Facebook user data. It may be data about people who are on Facebook that they have gathered themselves, but it is not data that we have provided.”

Cambridge Analytica’s chief executive, Alexander Nix, told the inquiry: “We do not work with Facebook data and we do not have Facebook data.”

Wylie, a Canadian data analytics expert who worked with Cambridge Analytica and Kogan to devise and implement the scheme, showed a dossier of evidence about the data misuse to the Observer which appears to raise questions about their testimony. He has passed it to the National Crime Agency’s cybercrime unit and the Information Commissioner’s Office. It includes emails, invoices, contracts and bank transfers that reveal more than 50 million profiles – mostly belonging to registered US voters – were harvested from the site in one of the largest-ever breaches of Facebook data. Facebook on Friday said that it was also suspending Wylie from accessing the platform while it carried out its investigation, despite his role as a whistleblower.

At the time of the data breach, Wylie was a Cambridge Analytica employee, but Facebook described him as working for Eunoia Technologies, a firm he set up on his own after leaving his former employer in late 2014.

The evidence Wylie supplied to UK and US authorities includes a letter from Facebook’s own lawyers sent to him in August 2016, asking him to destroy any data he held that had been collected by GSR, the company set up by Kogan to harvest the profiles.

That legal letter was sent several months after the Guardian first reported the breach and days before it was officially announced that Bannon was taking over as campaign manager for Trump and bringing Cambridge Analytica with him.

“Because this data was obtained and used without permission, and because GSR was not authorised to share or sell it to you, it cannot be used legitimately in the future and must be deleted immediately,” the letter said.

Facebook did not pursue a response when the letter initially went unanswered for weeks because Wylie was travelling, nor did it follow up with forensic checks on his computers or storage, he said.

“That to me was the most astonishing thing. They waited two years and did absolutely nothing to check that the data was deleted. All they asked me to do was tick a box on a form and post it back.”

Paul-Olivier Dehaye, a data protection specialist, who spearheaded the investigative efforts into the tech giant, said: “Facebook has denied and denied and denied this. It has misled MPs and congressional investigators and it’s failed in its duties to respect the law.

“It has a legal obligation to inform regulators and individuals about this data breach, and it hasn’t. It’s failed time and time again to be open and transparent.”

A majority of American states have laws requiring notification in some cases of data breach, including California, where Facebook is based.

Facebook denies that the harvesting of tens of millions of profiles by GSR and Cambridge Analytica was a data breach. It said in a statement that Kogan “gained access to this information in a legitimate way and through the proper channels” but “did not subsequently abide by our rules” because he passed the information on to third parties.

Facebook said it removed the app in 2015 and required certification from everyone with copies that the data had been destroyed, although the letter to Wylie did not arrive until the second half of 2016. “We are committed to vigorously enforcing our policies to protect people’s information. We will take whatever steps are required to see that this happens,” Paul Grewal, Facebook’s vice-president, said in a statement. The company is now investigating reports that not all data had been deleted.

Kogan, who has previously unreported links to a Russian university and took Russian grants for research, had a licence from Facebook to collect profile data, but it was for research purposes only. So when he hoovered up information for the commercial venture, he was violating the company’s terms. Kogan maintains everything he did was legal, and says he had a “close working relationship” with Facebook, which had granted him permission for his apps.

The Observer has seen a contract dated 4 June 2014, which confirms SCL, an affiliate of Cambridge Analytica, entered into a commercial arrangement with GSR, entirely premised on harvesting and processing Facebook data. Cambridge Analytica spent nearly $1m on data collection, which yielded more than 50 million individual profiles that could be matched to electoral rolls. It then used the test results and Facebook data to build an algorithm that could analyse individual Facebook profiles and determine personality traits linked to voting behaviour.

The algorithm and database together made a powerful political tool. It allowed a campaign to identify possible swing voters and craft messages more likely to resonate.

“The ultimate product of the training set is creating a ‘gold standard’ of understanding personality from Facebook profile information,” the contract specifies. It promises to create a database of 2 million “matched” profiles, identifiable and tied to electoral registers, across 11 states, but with room to expand much further.

At the time, more than 50 million profiles represented around a third of active North American Facebook users, and nearly a quarter of potential US voters. Yet when asked by MPs if any of his firm’s data had come from GSR, Nix said: “We had a relationship with GSR. They did some research for us back in 2014. That research proved to be fruitless and so the answer is no.”

Cambridge Analytica said that its contract with GSR stipulated that Kogan should seek informed consent for data collection and it had no reason to believe he would not.

GSR was “led by a seemingly reputable academic at an internationally renowned institution who made explicit contractual commitments to us regarding its legal authority to license data to SCL Elections”, a company spokesman said.

SCL Elections, an affiliate, worked with Facebook over the period to ensure it was satisfied no terms had been “knowingly breached” and provided a signed statement that all data and derivatives had been deleted, he said. Cambridge Analytica also said none of the data was used in the 2016 presidential election.

Steve Bannon’s lawyer said he had no comment because his client “knows nothing about the claims being asserted”. He added: “The first Mr Bannon heard of these reports was from media inquiries in the past few days.” He directed inquires to Nix.