Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (5.59 MB, 702 trang )
operating system could construct a program file using little more than an
editor capable of producing binary files.
Of course this is not how programs are produced. The closest that anyone
gets to this is writing assembler code. Assembler language programming is
very low-level. Its statements, after macro expansion, usually translate into
one or at most two machine language instructions. The assembler source
code is then fed through an assembler which converts the (almost) human
readable code into machine code, generates the appropriate header and
finally outputs an executable file.
Most programs, however, are written in a high-level language such as C, C++,
COBOL and so forth. It is the task of the compiler to translate high-level
instructions into low-level machine code in the most optimal way. The
resultant machine code output is generally very efficient, although –
depending on the compiler – it may be possible to write more efficiently in
assembler language. Because different compilers manage the translation and
optimization process in different ways, they will produce different output for
the same source code. In general it is true to say that the higher level the
source language, the more scope there is for variation in the resultant
executable file since there will be more possible translations of each
high-level statement into low-level machine code.
During the compilation process, high-level features such as variable and
function names are replaced by references to addresses in memory and by
machine code instructions, which cause the appropriate address to be
accessed (in the case of variables) or jumped to (in the case of functions).
In the case of both assembler language and high-level language
programming, the output of the assembler or compilation phase is generally
not immediately executable. Instead, an intermediate file (known as an object
module or object file1) is produced. One object file is produced for each
source file compiled, regardless of the content or structure of the source
code. These object modules are then combined using a tool called a linker
which is responsible for producing the final executable file (or shared library).
The linker ensures that references to a function or variable in one object
module from another object module are correctly resolved.
1
An unfortunate nomenclature and nothing at all to do with object-oriented programming. If the source file is the subject
of the compilation process then the resultant file must be the object.
118
Java 2 Network Security
Compile
Source File
Link
Object File
Program File
Figure 43. Program Compilation and Linking
In summary then:
• An object file contains the machine code which is the actual program plus
some additional information describing any dependencies on other object
files.
• An executable file is a collection of object files with all inter-file
dependencies resolved, together with some header information which
identifies the file as executable.
5.2 The Java Development Life Cycle
Moving back to the world of Java, we see that it is a high-level programming
language and that bytecode is the low-level machine language of the JVM.
Java is an object-oriented language; that is, it deals primarily with objects and
their interrelationships. Objects are best thought of in this context as a
collection of data (fields, in Java parlance) and the functions (methods) which
operate on that data. Objects are created at run time based on templates
(classes ) defined by the programmer.
A Java source file may contain definitions for one or more classes. During
compilation each of these classes results in the generation of a single class
file. In some respects, the class file is the Java equivalent of an object module
rather than an executable program file; that is, it contains compiled machine
code, but may also contain references to methods and fields which exist in
other classes and hence in other class files.
Class files are the last stage of the development process in Java. There is no
separate link phase. Linking is performed at run time by the JVM. If a
reference is found within one class file to another, then the JVM loads the
referenced class file and resolves the references as needed.
The astute reader will deduce that this demand loading and linking requires
the class file to contain information about other class files, methods and fields
Class Files in Java 2
119
which it references, and in particular, the names of these files, fields and
methods. This is in fact the case as we shall see in 5.3, “The Java 2 Class
File Format” on page 124.
Even more astute readers may be pondering some of the following questions.
• Is it possible to compile Java source code to some machine language
other than that of the JVM?
• Is it possible to compile some other high-level language to bytecode for
the JVM?
• Is there such a thing as an assembler for Java?
• What is the relationship between the Java language and bytecode?
The simple answer to the first three questions is yes.
It is possible with the appropriate compiler (generally referred to as a native
code compiler) to translate Java source code to any other low-level machine
code, although this rather defeats the Write Once, Run Anywhere proposition
for Java programs, since the resultant executable program will only run on the
platform for which it has been compiled.
It is also possible to compile other high-level languages into Java bytecode,
possibly via an interim step in which the source code is translated into Java
source code which is in turn compiled. Bytecode compilers already exist for
Ada, COBOL, BASIC and NetREXX (a dialect of the popular REXX
programming language).
Finally, Jasmin is a freely available Java assembler which allows serious
geeks to write Java code at a level one step removed from bytecode. Java
Grinder2 is a another freely available Java assembler and disassembler and
is very simple to use. Let’s consider the following Java code:
import java.io.*;
public class Count
{
public static void main(String[] args) throws Exception
{
int count=0;
if (args.length >= 1)
Figure 44. (Part 1 of 2). Count.java
2
Java Grinder can be downloaded from http://www-personal.umich.edu/~mcafee/java/.
120
Java 2 Network Security
{
FileInputStream fis = new FileInputStream(args[0]);
try
{
while (fis.read() != -1)
count++;
System.out.println("Hi! We counted " + count + " chars.");
} // try{} block ends
catch (Exception e)
{
System.out.println("No characters counted");
System.out.println("Exception caught" + e.toString());
} // catch(){} block ends
} // if block ends
else
System.err.println("Usage: Count file_name");
} // main() method ends
} // class Count ends
Figure 45. (Part 2 of 2). Count.java
We compile this code using the Java compiler:
javac Count.java
This command produces the Count.class file. This is a simple Java program
that counts the number of characters in a file. The file name is given as an
argument on the command line. If the Count program is able to count the
characters in the file, it prints the number of characters counted, and if not, it
prints the exception. We run this program against this sample text file, called
itso.txt:
Marco Pistoia
Duane Reller
Deepak Gupta
Milind Nagnur
Ashok Ramani
Figure 46. itso.txt
Both the Count.class and itso.txt files are stored in the same directory, say
D:\itso\ch05, and we launch the command:
java Count itso.txt
Class Files in Java 2
121
This is the output we receive:
Hi! We counted 70 chars
On disassembling the class file with the freely available software Java
Grinder, we get an output file, which is shown in the following figures:
public class Count extends Object {
public void
maxstack 1
aload_0
invokespecial void Object.
return
}
public static void main(String[]) throws Exception {
maxstack 4
iconst_0
istore_1
aload_0
arraylength
iconst_1
if_icmplt label4
new FileInputStream
dup
aload_0
iconst_0
aaload
invokespecial void FileInputStream.
astore_2
try // catch1
goto label2
label1:
iinc 1 1
label2:
aload_2
invokevirtual int FileInputStream.read()
iconst_m1
if_icmpne label1
getstatic PrintStream System.out
new StringBuffer
dup
ldc "Hi! We counted "
invokespecial void StringBuffer.
iload_1
invokevirtual StringBuffer StringBuffer.append(int)
ldc " chars."
Figure 47. (Part 1 of 2). Disassembled Count.class File
122
Java 2 Network Security
invokevirtual StringBuffer StringBuffer.append(String)
invokevirtual String StringBuffer.toString()
invokevirtual void PrintStream.println(String)
catch Exception:label3
goto label5
astore_3
getstatic PrintStream System.out
ldc "No characters counted"
invokevirtual void PrintStream.println(String)
getstatic PrintStream System.out
new StringBuffer
dup
ldc "Exception caught"
invokespecial void StringBuffer.
aload_3
invokevirtual String Throwable.toString()
invokevirtual StringBuffer StringBuffer.append(String)
invokevirtual String StringBuffer.toString()
invokevirtual void PrintStream.println(String)
goto label5
getstatic PrintStream System.err
ldc "Usage: Count file_name"
invokevirtual void PrintStream.println(String)
return
catch1:
label3:
label4:
label5:
}
}
Figure 48. (Part 2 of 2). Disassembled Count.class File
On assembling it again, we get the same functioning as the original class file.
Notice that even if someone changes your code by simply changing the
message:
Hi! We counted count chars
to something undesirable like:
Hi! Guess what else I did to this program
the result can be disturbing. It is possible to manipulate it even further and
add statements that can vary from serious things like reading files from your
system to merely annoying things like throwing up continuous messages.
Class files are most vulnerable when they are in transit along the information
superhighway. There are ways to help prevent or at least detect this
tampering. The Java 2 SDK provides tools for sealing classes in JAR files, as
Class Files in Java 2
123
we will see in 12.1.1, “Manifest File” on page 387 and 12.6, “The JAR Bug –
Fixed In Java 2 SDK, Standard Edition, V1.2.1” on page 461.
The following figure gives a pictorial model of how different languages, such
as COBOL, C++, NetREXX and Java, are compiled in different ways, as we
discussed in 5.1, “The Traditional Development Life Cycle” on page 117:
e
tiv
Na
COBOL
Source
r
ile
mp
Co
Bytecode
Compiler
Object
Module
Li
nk
Class
File
Executable
File
Link
C++
Source
Bytecode
Compiler
Object
Module
NetREXX
Source
Native Compiler
By
t
Co ecod
mp e
iler
Na
tiv
eC
om
pil
er
ad
Java
Source
Lo
nk
Li
Object
Module
Class
File
Load
Java
Virtual
Machine
ad
Lo
Class
File
Figure 49. Compiler Models
5.3 The Java 2 Class File Format
The class file contains a lot more information than its cousin, the executable
file. Of course, it still contains the same type of information: program
requirements, an identifier indicating that this is a program and executable
code (bytecode, in this case). However, it also contains some very rich
information about the original source code.
124
Java 2 Network Security
The high level structure of a class file is shown in the following table:
Table 2. Class File Contents
Field
Description
Magic number
Four bytes identifying this file as a Java class file. Always set to 0xCAFEBABE
JVM minor version
The minor version number of the JVM on which this class file is intended to run
JVM major version
The major version number of the JVM on which this class file is intended to run
Constant pool count
Number of entries in Constant Pool Table
Constant pool
See 5.4, “The Constant Pool” on page 129
Access flags
Mask of modifiers used with class and interface declaration
Class name
The name of this class
Super class name
The name of the superclass in the Java class hierarchy
Interfaces count
Number of direct super interfaces
Interfaces
Description of the interfaces implemented for this class
Fields count
Number of structures in the fields table
Fields
Description of the class variables defined for this class
Methods count
Number of structures in the methods table
Methods
Description of the methods declared by this class
Attributes count
Number of attributes in the attributes table
Attributes
Attributes associated with the class file
Much here is as we would expect. There is information to identify the file as a
Java class file, as well as the JVM on which it was compiled to run. In
addition, there is information describing the dependencies of this class in
terms of classes, interfaces3, fields, and methods. There is much more
information than this however, buried within the constant pool (see 5.4, “The
Constant Pool” on page 129): information which includes variable and method
names within both this class file and those on which it depends.
Let’s explain in more detail the fields listed in Table 2:
3
Each Java class has only a single superclass, and it inherits variables and methods from that superclass and all its
superclasses. This limitation makes the relationship between classes easy to understand and design, but it can also be
restrictive. To solve this problem, Java introduces the concept of interfaces, which collect method names (not
implementations) into one place, and then allow you to add those methods as a group to the various classes that need
them.
Class Files in Java 2
125
• The magic number is a hexadecimal number identifying the class format
and is always 0xCAFEBABE4.
• The values of minor version and major version are the minor and major
versions of the compiler that produced this class.
• The constant pool is a table of variable length structures representing
various string constants, class names, field names, and other constants
that are referred to.
• The access flag is a mask of modifiers used with the class and interface
declarations (for example, ACC_PUBLIC for public class or interface,
ACC_FINAL for a final class etc. – see 2.1.1.2, “Access to Classes, Fields
and Methods” on page 42).
• The interfaces field is an array of entries describing the interfaces
implemented by the class.
• The fields field is an array of entries describing the class variables
declared by this class or interface. It does not include those inherited.
• The methods field is an array of entries describing the methods declared
by this class or interface.
• The only attribute defined for the attributes table is SourceFile, which
indicates the name of the source file from which the class was created.
In addition to managing dynamic linking, the JVM must also ensure that class
files contain only legal bytecode and do not attempt to subvert the run-time
environment, and to do this, still more information is required in the class.
More details of how this works are in Chapter 6, “The Class Loader and Class
File Verifier” on page 145.
The main thing to understand at this point is that the inclusion of all of this
information makes the job of a hacker much simpler in many ways. We
discuss this in the next section.
5.3.1 Decompilation Attacks
One of the areas seldom discussed when considering security
implications of deploying Java is that of securing Java assets. Often
considerable effort is put into developing software and the resultant
intellectual property can be very valuable to a company.
Hackers are a clever (although potentially misguided) bunch and there are
many reasons why they might want to get inside your code. Here are a
few:
4
Just out of curiosity, 0xCAFEBABE corresponds to the decimal number 3405691582.
126
Java 2 Network Security
• To steal a valuable algorithm for use in their own code
• To understand how a security function works to enable them to bypass it
• To extract confidential information (such as hard-coded passwords and
keys)
• To enable them to alter the code so that it behaves in a malicious way
(such as installing Trojan horses or viruses)
• To demonstrate their prowess
• For their entertainment (much as other people might solve crosswords)
The chief tool in the arsenal of the hacker in these cases is the decompiler. A
decompiler, as its name suggests, undoes the work performed by a compiler.
That is, it takes an executable file and attempts to re-create the original
source code.
Advances in compiler technology now make it effectively impossible to go
from machine code to a high-level language such as C. Modern compilers
remove all variable and function names, move code about to optimize its
execution profile and, as was discussed previously, there are many possible
ways to translate a high-level statement into a low-level machine code
representation. For a decompiler, to produce the original source code is
impossible without a lot of additional information which simply is not shipped
in an executable file.
It is, however, very easy to recover an assembler language version of the
program. On the other hand, the amount of effort required to actually
understand what such a program does makes it far less worthwhile to the
hacker to do.5 So, it is fair to say that it is impossible to completely protect any
program from tampering.
When the Java Development Kit (JDK) 1.0.2 was shipped, a decompiler
named Mocha was quickly available which performed excellently. It was able
to recover Java source code from a class file. It was so successful that at
least one person used it as a way of formatting his source code! In fact the
only information lost in the compilation process (and unrecoverable using
Mocha) are the comments. However, if meaningful variable names are used
in the code (such as accountNumber, or password), then it is readily possible to
understand the function of the code, even without the comments.
5
Nevertheless, it is done. Much pirated software is distributed in a cracked format, that is, with software protection
disabled or removed.
Class Files in Java 2
127
Already, there are decompilers available, like SourceAgain6, which can
decompile Java codes including those programs written with the Java 2 SDK
using new APIs.
Here is what a test decompiler returned for the same Count.class file we used
in 5.2, “The Java Development Life Cycle” on page 119 (the originating
source code Count.java was shown in Figure 44 on page 120 and Figure 45
on page 121):
import java.io.FileInputStream;
import java.io.PrintStream;
public class Count
{
public static void main(String[] as) throws Exception
{
int i = 0;
if (as.length >= 1)
{
FileInputStream fileinputstream1 = new FileInputStream (as[0]);
try
{
while (fileinputstream1.read() != -1)
++i;
System.out.println ("Hi! We counted " + i + " chars.");
}
catch(Exception exception1)
{
System.out.println("No characters counted");
System.out.println("Exception caught" + exception1.toString());
}
}
else
System.err.println("Usage: Count file_name");
}
}
Figure 50. Decompiled Count.class
You can see that the code has been successfully decompiled. Only small
things like the name of the variables are changed.
There can be some advantages of having a decompiler:
1. Recovery of lost source code (by accident or otherwise)
6
See http://www.ahpah.com/product.html.
128
Java 2 Network Security