How the U.S. Uses Technology to Mine More Data More Quickly
By JAMES RISEN and ERIC LICHTBLAU
WASHINGTON — When American analysts hunting terrorists sought new ways
to comb through the troves of phone records, e-mails and other data
piling up as digital communications exploded over the past decade, they
turned to Silicon Valley computer experts who had developed complex
equations to thwart Russian mobsters intent on credit card fraud.
The partnership between the intelligence community and Palantir Technologies, a Palo Alto, Calif., company founded by a group of inventors from PayPal, is just one of many that the National Security Agency and other agencies have forged as they have rushed to unlock the secrets of “Big Data.”
Today, a revolution in software technology that allows for the highly
automated and instantaneous analysis of enormous volumes of digital
information has transformed the N.S.A., turning it into the virtual
landlord of the digital assets of Americans and foreigners alike. The
new technology has, for the first time, given America’s spies the
ability to track the activities and movements of people almost anywhere
in the world without actually watching them or listening to their
conversations.
New disclosures that the N.S.A. has secretly acquired the phone records
of millions of Americans and access to e-mails, videos and other data of
foreigners from nine United States Internet companies have provided a
rare glimpse into the growing reach of the nation’s largest spy agency.
They have also alarmed the government: on Saturday night, Shawn Turner, a
spokesman for the director of national intelligence, said that “a
crimes report has been filed by the N.S.A.”
With little public debate, the N.S.A. has been undergoing rapid
expansion in order to exploit the mountains of new data being created
each day. The government has poured billions of dollars into the agency
over the last decade, building a one-million-square-foot fortress in the
mountains of Utah, apparently to store huge volumes of personal data
indefinitely. It created intercept stations across the country,
according to former industry and intelligence officials, and helped
build one of the world’s fastest computers to crack the codes that
protect information.
While once the flow of data across the Internet appeared too
overwhelming for N.S.A. to keep up with, the recent revelations suggest
that the agency’s capabilities are now far greater than most outsiders
believed. “Five years ago, I would have said they don’t have the
capability to monitor a significant amount of Internet traffic,” said Herbert S. Lin,
an expert in computer science and telecommunications at the National
Research Council. Now, he said, it appears “that they are getting close
to that goal.”
On Saturday, it became clear how close: Another N.S.A. document, again
cited by The Guardian, showed a “global heat map” that appeared to
represent how much data the N.S.A. sweeps up around the world. It showed
that in March 2013 there were 97 billion pieces of data collected from
networks worldwide; about 14 percent of it was in Iran, much was from
Pakistan and about 3 percent came from inside the United States, though
some of that might have been foreign data traffic routed through
American-based servers.
A Shift in Focus
The agency’s ability to efficiently mine metadata, data about who is
calling or e-mailing, has made wiretapping and eavesdropping on
communications far less vital, according to data experts. That access to
data from companies that Americans depend on daily raises troubling
questions about privacy and civil liberties that officials in
Washington, insistent on near-total secrecy, have yet to address.
“American laws and American policy view the content of communications as
the most private and the most valuable, but that is backwards today,”
said Marc Rotenberg, the executive director of the Electronic Privacy Information Center,
a Washington group. “The information associated with communications
today is often more significant than the communications itself, and the
people who do the data mining know that.”
In the 1960s, when the N.S.A. successfully intercepted the primitive car
phones used by Soviet leaders driving around Moscow in their Zil
limousines, there was no chance the agency would accidentally pick up
Americans. Today, if it is scanning for a foreign politician’s Gmail
account or hunting for the cellphone number of someone suspected of
being a terrorist, the possibilities for what N.S.A. calls “incidental”
collection of Americans are far greater.
United States laws restrict wiretapping and eavesdropping on the actual
content of the communications of American citizens but offer very little
protection to the digital data thrown off by the telephone when a call
is made. And they offer virtually no protection to other forms of
non-telephone-related data like credit card transactions.
Because of smartphones, tablets, social media sites, e-mail and other
forms of digital communications, the world creates 2.5 quintillion bytes
of new data daily, according to I.B.M.
The company estimates that 90 percent of the data that now exists in the
world has been created in just the last two years. From now until 2020,
the digital universe is expected to double every two years, according
to a study by the International Data Corporation.
Accompanying that explosive growth has been rapid progress in the ability to sift through the information.
When separate streams of data are integrated into large databases —
matching, for example, time and location data from cellphones with
credit card purchases or E-ZPass use — intelligence analysts are given a
mosaic of a person’s life that would never be available from simply
listening to their conversations. Just four data points about the
location and time of a mobile phone call, a study published in Nature
found, make it possible to identify the caller 95 percent of the time.
“We can find all sorts of correlations and patterns,” said one
government computer scientist who spoke on condition of anonymity
because he was not authorized to comment publicly. “There have been
tremendous advances.”
Secret Programs
When President George W. Bush secretly began the N.S.A.’s warrantless
wiretapping program in October 2001, to listen in on the international
telephone calls and e-mails of American citizens without court approval,
the program was accompanied by large-scale data mining operations.
Those secret programs prompted a showdown in March 2004 between Bush
White House officials and a group of top Justice Department and F.B.I.
officials in the hospital room of John Ashcroft, then the attorney
general. Justice Department lawyers who were willing to go along with
warrantless wiretapping argued that the data mining raised greater
constitutional concerns.
In 2003, after a Pentagon plan to create a data-mining operation known
as the Total Information Awareness program was disclosed, a firestorm of
protest forced the Bush administration to back off.
But since then, the intelligence community’s data-mining operations have
grown enormously, according to industry and intelligence experts.
The confrontation in Mr. Ashcroft’s hospital room took place just one
month after a Harvard undergraduate, Mark Zuckerberg, created Facebook;
Twitter would not be founded for two more years. Apple’s iPhone and iPad
did not yet exist.
“More and more services like Google and Facebook have become huge
central repositories for information,” observed Dan Auerbach, a
technology analyst with the Electronic Frontier Foundation. “That’s created a pile of data that is an incredibly attractive target for law enforcement and intelligence agencies.”
The spy agencies have long been among the most demanding customers for
advanced computing and data-mining software — and even more so in recent
years, according to industry analysts. “They tell you that somewhere
there is an American who is going to be blown up,” said a former
technology executive, and “the only thing that stands between that and
him living is you.”
In 2006, the Bush administration established a program known as the Intelligence Advanced Research Projects Activity,
to accelerate the development of intelligence-related technology
intended “to provide the United States with an overwhelming intelligence
advantage over future adversaries.”
I.B.M.’s Watson, the supercomputing technology that defeated human
Jeopardy! champions in 2011, is a prime example of the power of
data-intensive artificial intelligence.
Watson-style computing, analysts said, is precisely the technology that
would make the ambitious data-collection program of the N.S.A. seem
practical. Computers could instantly sift through the mass of Internet
communications data, see patterns of suspicious online behavior and thus
narrow the hunt for terrorists.
Both the N.S.A. and the Central Intelligence Agency have been testing
Watson in the last two years, said a consultant who has advised the
government and asked not to be identified because he was not authorized
to speak.
Trilaterization
Industry experts say that intelligence and law enforcement agencies also
use a new technology, known as trilaterization, that allows tracking of
an individual’s location, moment to moment. The data, obtained from
cellphone towers, can track the altitude of a person, down to the
specific floor in a building. There is even software that exploits the
cellphone data seeking to predict a person’s most likely route. “It is
extreme Big Brother,” said Alex Fielding, an expert in networking and
data centers.
In addition to opening the Utah data center, reportedly scheduled for
this year, N.S.A. has secretly enlarged its footprint inside the United
States, according to accounts from whistle-blowers in recent years.
In Virginia, a telecommunications consultant reported, Verizon had set
up a dedicated fiber-optic line running from New Jersey to Quantico,
Va., home to a large military base, allowing government officials to
gain access to all communications flowing through the carrier’s
operations center.
In Georgia, an N.S.A. official said in interviews, the agency had combed
through huge volumes of routine e-mails to and from Americans.
And in San Francisco, a technician at AT& T reported on the
existence of a secret room there reserved for the N.S.A. that allowed
the spy agency to copy and store millions of domestic and international
phone calls routed through that station.
Nothing revealed in recent days suggests that N.S.A. eavesdroppers have
violated the law by targeting ordinary Americans. On Friday, President Obama
defended the agency’s collection of phone records and other metadata,
saying it did not involve listening to conversations or reading the
content of e-mails. “Some of the hype we’ve been hearing over the past
day or so — nobody has listened to the content of people’s phone calls,”
he said.
Mr. Rotenberg, referring to the constitutional limits on search and
seizure, said, “It is a bit of a fantasy to think that the government
can seize so much information without implicating the Fourth Amendment interests of American citizens.”
No comments:
Post a Comment