[ISN] Reading Saddam's Email - What to do with an enemy's hard drives.
isn at c4i.org
Mon Jan 30 01:33:49 EST 2006
Forwarded from: William Knowles <wk at c4i.org>
by Michael Tanji
Volume 011, Issue 20
STEPHEN F. HAYES has written extensively in these pages about a large
cache of documents and digital media captured in the course of
Operation Iraqi Freedom and Operation Enduring Freedom. As a former
intelligence officer who dealt with digital media exploitation and
analysis issues at the Defense Intelligence Agency for nearly four
years (2001 to 2005), I am prohibited from speaking publicly about
what these documents may contain. What I can do is share my
professional opinion on how one might solve some of the major problems
associated with media exploitation.
Let us assume hypothetically that the United States has overthrown a
hostile regime, and a vast amount of paper and digital media has been
looted or otherwise removed from the regime's ministries, industrial
centers, and other facilities. A great deal of this material has been
obtained by the U.S. military and eventually the U.S. intelligence
Because of the lack of context--reliable information about where each
item was obtained, who it belonged to, and so on--U.S. intelligence is
faced with trying to make sense of a massive, amorphous heap of paper
and digital data.
The demands are tremendous. Combat commanders need actionable
intelligence so they can turn around and capture or kill more of the
enemy (and obtain still more media to exploit). But technical
expertise and high-end equipment are hard to come by. So is good,
trustworthy linguistic support. Subject matter experts are by and
large still back in Washington. Given the problems, how does U.S.
intelligence perform deep analysis on data that clearly need it?
The process of exploitation begins with the recognition that neither
human intelligence nor signals intelligence is the be-all and end-all.
Human sources can lie. They can hide parts of the truth. Unwitting
dupes in a deception scheme can honestly tell you what they think is
the truth. Intercepted signals generally reveal only part of the
intelligence picture. In a complex web of bad guys, tapping the phones
of one or two leaves a lot of gaps, especially when your adversary is
a whole network of webs.
Digital media, on the other hand, are less prone to be a means of
deception, and even one node of a network can reveal a significant
amount about the entire network. Think about the data that you keep on
your computers at work and at home. Unless you write fiction for a
living, these are the most accurate and factual data that can be
obtained about you (short of reading your mind). The memos and letters
you write, the financial information you calculate, the websites you
visit, and the people you email or instant-message--all this is a gold
mine for anyone looking to know who you are, what you do, and with
whom you cavort. Now imagine having access to the same data about your
Enter "computer forensics." Exploiting paper documents is a relatively
simple matter of reading and, if necessary, translating. Exploiting
digital media is another story. Before you can read the data, you have
to find it.
Outside the intelligence field, computer forensics is the process by
which data are extracted, preserved, and analyzed for pertinence and
meaning. The computer forensics community has worked very hard to
bring its practices up to the level portrayed on TV in shows like CSI,
where digital evidence is now accepted in court as much as
fingerprints or blood splatters.
It stands to reason that the same people, tools, and methods used in
computer crime labs are also used in intelligence efforts. However,
the courtroom-centric, linear, law-enforcement mindset is actually a
hindrance to effective exploitation for purposes of intelligence. A
military intelligence unit is not interested in going to court; it is
interested in helping soldiers put steel on target. This is not to say
that a law enforcement approach has no use in the larger intelligence
business (for example, in counterintelligence investigations), but if
the goal is good data fast, then what is good for cops is not good for
ASSUME OUR HYPOTHETICAL hostile regime was a fairly large country with
a population around 25 million. It was not the most technically
advanced nation in the world, but it had ministries and industries and
was believed to have advanced weapons capabilities. All these needed
computers to function. How much data does this translate into?
Consider some rough calculations.
One floor of an average-sized university library full of academic
journals contains about 100 gigabytes of data, the size of a large but
not uncommon hard drive. The data in 100 such hard drives are
comparable to the print holdings of the Library of Congress. Care to
guess whether our formerly hostile regime had more than 100 computers?
As if sheer quantity of data were not problem enough, remember that
the materials have almost no supporting contextual information. A
computer forensics examiner in a crime lab generally has access to the
investigators, knows the nature of the crime, and knows the most
common places to look for evidence. A piece of evidence comes to him
in a plastic bag with a tag on it saying where it was found, what kind
of computer it came out of, and so on.
On the battlefield there is no time to "bag-and-tag" evidence. You
find something that looks useful; you grab it, secure it, and move on.
When the mission is over, you head to the tent where the Military
Intelligence guys hang out and drop off your goods, covered in dust
and a lot worse for wear. Under such conditions, context beyond a
label reading "hard drive found on Monday" is scarce.
You have a huge store of data and only the slightest idea where it
came from, a vague idea of what to look for, and you must do the job
to a standard of proof mindlessly imported from law enforcement and
far exceeding what is necessary for your work. Is it any wonder that
some consider the job hopeless? How can we hope to make any real sense
of this mass of stuff?
Technology can help. First, when data come without any meaningful
context, we have to re-create it after the fact. We begin to do this
by building lists of keywords, phrases, personalities, and other data
that pertain to the topics of interest to our intelligence services.
These lists can easily include tens of thousands of terms, names,
figures, and data formats.
The next step is to create a forensically sound process to spin off
the more meaningful pieces of data (user-created documents, emails,
spreadsheets, etc.) while leaving behind data that have less utility
(files associated with the operating system and software
applications). Let's call this our forensic centrifuge.
Ideally our centrifuge will be built out of a cluster of computers:
dozens of cheap processors networked together and scaled to rival a
supercomputer in power. Cluster computers have been used by academia
and the government for years, notably in places like NASA and the
Department of Energy.
Computer programs written to take advantage of the multiprocessor
capabilities of the centrifuge will extract the easy-to-obtain data
files, recover deleted files and those that have been obfuscated by
various means, and find the data stored in web browsers, email
software, and other programs. There are commercial applications that
do this, but our applications will have to be custom-made.
Once we have this notional system, we can aim it at our amorphous heap
of captured data. The result should be large but much more meaningful
subsets of data that we can be reasonably assured were created by
members of the former regime. The problem of authenticity that
sometimes complicates the exploitation of paper documents virtually
does not arise.
While we now have all the meaningful data we can obtain, there is one
more step to take before we can overlay what is called our "contextual
appliqué." Our extracted data files must be compared with files of the
same type--another computer process easily crafted--for both physical
and content similarities. Through this process we should be able to
determine things like:
* the names of people who drafted, edited, and were expected to
receive memorandums, letters, and orders, and sometimes which
computers they worked on;
* which computers were likely networked together, within the same
ministry or between trusted associates;
* discussions between former regime elements in the form of both
memorandums and email exchanges, as well as the personal thoughts
revealed in private letters between confidants; and
* the foreign contacts of former regime elements in the form of email
addresses and website data.
This information and more can be used to reconstruct both the physical
and social networks of our former hostile regime. It can show who was
talking to whom and who was working on what prior to the war. Our
contextual appliqué is now complete, and many gaps left by
insufficient prewar human and signals intelligence can be filled in.
THE SYSTEM JUST DESCRIBED for sorting and organizing data is notional,
but not fanciful. The technology exists, the mental wherewithal
exists, and the contract vehicles exist. The problem of finding enough
qualified, trusted Arabic speakers and translators is great, but
familiar. If we want to do this, we know how. If we want to do it
fast, and provide sufficient resources, we can see significant results
Adapting widely accepted technical methodologies to the unique
challenges our intelligence services face is merely good sense. Modern
technologies could be put to good use by the intelligence community to
solve data extraction, processing, analysis, and display problems, if
only certain elements in the community could get over the
"not-invented-here" syndrome. There are signs of progress, but it is
slow. Let's face it: You've probably got more powerful software on
your computer at home than the average intelligence analyst has on the
There is of course a strong political aspect to media exploitation.
Which end of the political spectrum will come out ahead is not clear
going in. We could very well have in our possession ample material to
support all the reasons the public was told justified going to war--or
we could find the opposite, or find there are no clear conclusions to
be drawn. But unless we look, we will always be faced--in the immortal
words of Donald Rumsfeld--with a huge cache of "unknown unknowns."
After all the detainees have been interrogated, and all of the sand at
suspected facilities has been sifted and tested, the only way finally
to close the book on what our hypothetical former hostile regime was
up to is to analyze every last reliable source of data available to
us. That is, if we are really interested in the truth.
Michael Tanji is an associate of the Terrorism Research Center. He
opines on intelligence and security issues at groupintel.com.
© Copyright 2005, News Corporation, Weekly Standard,
All Rights Reserved.
"Communications without intelligence is noise; Intelligence
without communications is irrelevant." Gen Alfred. M. Gray, USMC
C4I.org - Computer Security, & Intelligence - http://www.c4i.org
More information about the ISN