[ISN] Reading Saddam's Email - What to do with an enemy's hard drives.

Mon Jan 30 01:33:49 EST 2006

Forwarded from: William Knowles <wk at c4i.org>

http://www.weeklystandard.com/Content/Public/Articles/000/000/006/652zozfg.asp

by Michael Tanji 
02/06/2006
Volume 011, Issue 20 

STEPHEN F. HAYES has written extensively in these pages about a large 
cache of documents and digital media captured in the course of 
Operation Iraqi Freedom and Operation Enduring Freedom. As a former 
intelligence officer who dealt with digital media exploitation and 
analysis issues at the Defense Intelligence Agency for nearly four 
years (2001 to 2005), I am prohibited from speaking publicly about 
what these documents may contain. What I can do is share my 
professional opinion on how one might solve some of the major problems 
associated with media exploitation.

Let us assume hypothetically that the United States has overthrown a 
hostile regime, and a vast amount of paper and digital media has been 
looted or otherwise removed from the regime's ministries, industrial 
centers, and other facilities. A great deal of this material has been 
obtained by the U.S. military and eventually the U.S. intelligence 
services.

Because of the lack of context--reliable information about where each 
item was obtained, who it belonged to, and so on--U.S. intelligence is 
faced with trying to make sense of a massive, amorphous heap of paper 
and digital data.

The demands are tremendous. Combat commanders need actionable 
intelligence so they can turn around and capture or kill more of the 
enemy (and obtain still more media to exploit). But technical 
expertise and high-end equipment are hard to come by. So is good, 
trustworthy linguistic support. Subject matter experts are by and 
large still back in Washington. Given the problems, how does U.S. 
intelligence perform deep analysis on data that clearly need it?

The process of exploitation begins with the recognition that neither 
human intelligence nor signals intelligence is the be-all and end-all. 
Human sources can lie. They can hide parts of the truth. Unwitting 
dupes in a deception scheme can honestly tell you what they think is 
the truth. Intercepted signals generally reveal only part of the 
intelligence picture. In a complex web of bad guys, tapping the phones 
of one or two leaves a lot of gaps, especially when your adversary is 
a whole network of webs.

Digital media, on the other hand, are less prone to be a means of 
deception, and even one node of a network can reveal a significant 
amount about the entire network. Think about the data that you keep on 
your computers at work and at home. Unless you write fiction for a 
living, these are the most accurate and factual data that can be 
obtained about you (short of reading your mind). The memos and letters 
you write, the financial information you calculate, the websites you 
visit, and the people you email or instant-message--all this is a gold 
mine for anyone looking to know who you are, what you do, and with 
whom you cavort. Now imagine having access to the same data about your 
adversary.

Enter "computer forensics." Exploiting paper documents is a relatively 
simple matter of reading and, if necessary, translating. Exploiting 
digital media is another story. Before you can read the data, you have 
to find it.

Outside the intelligence field, computer forensics is the process by 
which data are extracted, preserved, and analyzed for pertinence and 
meaning. The computer forensics community has worked very hard to 
bring its practices up to the level portrayed on TV in shows like CSI, 
where digital evidence is now accepted in court as much as 
fingerprints or blood splatters.

It stands to reason that the same people, tools, and methods used in 
computer crime labs are also used in intelligence efforts. However, 
the courtroom-centric, linear, law-enforcement mindset is actually a 
hindrance to effective exploitation for purposes of intelligence. A 
military intelligence unit is not interested in going to court; it is 
interested in helping soldiers put steel on target. This is not to say 
that a law enforcement approach has no use in the larger intelligence 
business (for example, in counterintelligence investigations), but if 
the goal is good data fast, then what is good for cops is not good for 
soldiers.

ASSUME OUR HYPOTHETICAL hostile regime was a fairly large country with 
a population around 25 million. It was not the most technically 
advanced nation in the world, but it had ministries and industries and 
was believed to have advanced weapons capabilities. All these needed 
computers to function. How much data does this translate into? 
Consider some rough calculations.

One floor of an average-sized university library full of academic 
journals contains about 100 gigabytes of data, the size of a large but 
not uncommon hard drive. The data in 100 such hard drives are 
comparable to the print holdings of the Library of Congress. Care to 
guess whether our formerly hostile regime had more than 100 computers?

As if sheer quantity of data were not problem enough, remember that 
the materials have almost no supporting contextual information. A 
computer forensics examiner in a crime lab generally has access to the 
investigators, knows the nature of the crime, and knows the most 
common places to look for evidence. A piece of evidence comes to him 
in a plastic bag with a tag on it saying where it was found, what kind 
of computer it came out of, and so on.

On the battlefield there is no time to "bag-and-tag" evidence. You 
find something that looks useful; you grab it, secure it, and move on. 
When the mission is over, you head to the tent where the Military 
Intelligence guys hang out and drop off your goods, covered in dust 
and a lot worse for wear. Under such conditions, context beyond a 
label reading "hard drive found on Monday" is scarce.

You have a huge store of data and only the slightest idea where it 
came from, a vague idea of what to look for, and you must do the job 
to a standard of proof mindlessly imported from law enforcement and 
far exceeding what is necessary for your work. Is it any wonder that 
some consider the job hopeless? How can we hope to make any real sense 
of this mass of stuff?

Technology can help. First, when data come without any meaningful 
context, we have to re-create it after the fact. We begin to do this 
by building lists of keywords, phrases, personalities, and other data 
that pertain to the topics of interest to our intelligence services. 
These lists can easily include tens of thousands of terms, names, 
figures, and data formats.

The next step is to create a forensically sound process to spin off 
the more meaningful pieces of data (user-created documents, emails, 
spreadsheets, etc.) while leaving behind data that have less utility 
(files associated with the operating system and software 
applications). Let's call this our forensic centrifuge.

Ideally our centrifuge will be built out of a cluster of computers: 
dozens of cheap processors networked together and scaled to rival a 
supercomputer in power. Cluster computers have been used by academia 
and the government for years, notably in places like NASA and the 
Department of Energy.

Computer programs written to take advantage of the multiprocessor 
capabilities of the centrifuge will extract the easy-to-obtain data 
files, recover deleted files and those that have been obfuscated by 
various means, and find the data stored in web browsers, email 
software, and other programs. There are commercial applications that 
do this, but our applications will have to be custom-made.

Once we have this notional system, we can aim it at our amorphous heap 
of captured data. The result should be large but much more meaningful 
subsets of data that we can be reasonably assured were created by 
members of the former regime. The problem of authenticity that 
sometimes complicates the exploitation of paper documents virtually 
does not arise.

While we now have all the meaningful data we can obtain, there is one 
more step to take before we can overlay what is called our "contextual 
appliqué." Our extracted data files must be compared with files of the 
same type--another computer process easily crafted--for both physical 
and content similarities. Through this process we should be able to 
determine things like:

* the names of people who drafted, edited, and were expected to 
  receive memorandums, letters, and orders, and sometimes which 
  computers they worked on;

* which computers were likely networked together, within the same 
  ministry or between trusted associates;

* discussions between former regime elements in the form of both 
  memorandums and email exchanges, as well as the personal thoughts 
  revealed in private letters between confidants; and

* the foreign contacts of former regime elements in the form of email 
  addresses and website data.

This information and more can be used to reconstruct both the physical 
and social networks of our former hostile regime. It can show who was 
talking to whom and who was working on what prior to the war. Our 
contextual appliqué is now complete, and many gaps left by 
insufficient prewar human and signals intelligence can be filled in.

THE SYSTEM JUST DESCRIBED for sorting and organizing data is notional, 
but not fanciful. The technology exists, the mental wherewithal 
exists, and the contract vehicles exist. The problem of finding enough 
qualified, trusted Arabic speakers and translators is great, but 
familiar. If we want to do this, we know how. If we want to do it 
fast, and provide sufficient resources, we can see significant results 
this year.

Adapting widely accepted technical methodologies to the unique 
challenges our intelligence services face is merely good sense. Modern 
technologies could be put to good use by the intelligence community to 
solve data extraction, processing, analysis, and display problems, if 
only certain elements in the community could get over the 
"not-invented-here" syndrome. There are signs of progress, but it is 
slow. Let's face it: You've probably got more powerful software on 
your computer at home than the average intelligence analyst has on the 
job.

There is of course a strong political aspect to media exploitation. 
Which end of the political spectrum will come out ahead is not clear 
going in. We could very well have in our possession ample material to 
support all the reasons the public was told justified going to war--or 
we could find the opposite, or find there are no clear conclusions to 
be drawn. But unless we look, we will always be faced--in the immortal 
words of Donald Rumsfeld--with a huge cache of "unknown unknowns."

After all the detainees have been interrogated, and all of the sand at 
suspected facilities has been sifted and tested, the only way finally 
to close the book on what our hypothetical former hostile regime was 
up to is to analyze every last reliable source of data available to 
us. That is, if we are really interested in the truth.

-=-

Michael Tanji is an associate of the Terrorism Research Center. He 
opines on intelligence and security issues at groupintel.com.

*==============================================================*
"Communications without intelligence is noise;  Intelligence
without communications is irrelevant." Gen Alfred. M. Gray, USMC
================================================================
C4I.org - Computer Security, & Intelligence - http://www.c4i.org
*==============================================================*