1. Trang chủ >
  2. Công Nghệ Thông Tin >
  3. An ninh - Bảo mật >

Chapter 5. Class Files in Java 2

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (5.59 MB, 702 trang )


operating system could construct a program file using little more than an

editor capable of producing binary files.

Of course this is not how programs are produced. The closest that anyone

gets to this is writing assembler code. Assembler language programming is

very low-level. Its statements, after macro expansion, usually translate into

one or at most two machine language instructions. The assembler source

code is then fed through an assembler which converts the (almost) human

readable code into machine code, generates the appropriate header and

finally outputs an executable file.

Most programs, however, are written in a high-level language such as C, C++,

COBOL and so forth. It is the task of the compiler to translate high-level

instructions into low-level machine code in the most optimal way. The

resultant machine code output is generally very efficient, although –

depending on the compiler – it may be possible to write more efficiently in

assembler language. Because different compilers manage the translation and

optimization process in different ways, they will produce different output for

the same source code. In general it is true to say that the higher level the

source language, the more scope there is for variation in the resultant

executable file since there will be more possible translations of each

high-level statement into low-level machine code.

During the compilation process, high-level features such as variable and

function names are replaced by references to addresses in memory and by

machine code instructions, which cause the appropriate address to be

accessed (in the case of variables) or jumped to (in the case of functions).

In the case of both assembler language and high-level language

programming, the output of the assembler or compilation phase is generally

not immediately executable. Instead, an intermediate file (known as an object

module or object file1) is produced. One object file is produced for each

source file compiled, regardless of the content or structure of the source

code. These object modules are then combined using a tool called a linker

which is responsible for producing the final executable file (or shared library).

The linker ensures that references to a function or variable in one object

module from another object module are correctly resolved.



1

An unfortunate nomenclature and nothing at all to do with object-oriented programming. If the source file is the subject

of the compilation process then the resultant file must be the object.



118



Java 2 Network Security



Compile

Source File



Link

Object File



Program File



Figure 43. Program Compilation and Linking



In summary then:

• An object file contains the machine code which is the actual program plus

some additional information describing any dependencies on other object

files.

• An executable file is a collection of object files with all inter-file

dependencies resolved, together with some header information which

identifies the file as executable.



5.2 The Java Development Life Cycle

Moving back to the world of Java, we see that it is a high-level programming

language and that bytecode is the low-level machine language of the JVM.

Java is an object-oriented language; that is, it deals primarily with objects and

their interrelationships. Objects are best thought of in this context as a

collection of data (fields, in Java parlance) and the functions (methods) which

operate on that data. Objects are created at run time based on templates

(classes ) defined by the programmer.

A Java source file may contain definitions for one or more classes. During

compilation each of these classes results in the generation of a single class

file. In some respects, the class file is the Java equivalent of an object module

rather than an executable program file; that is, it contains compiled machine

code, but may also contain references to methods and fields which exist in

other classes and hence in other class files.

Class files are the last stage of the development process in Java. There is no

separate link phase. Linking is performed at run time by the JVM. If a

reference is found within one class file to another, then the JVM loads the

referenced class file and resolves the references as needed.

The astute reader will deduce that this demand loading and linking requires

the class file to contain information about other class files, methods and fields



Class Files in Java 2



119



which it references, and in particular, the names of these files, fields and

methods. This is in fact the case as we shall see in 5.3, “The Java 2 Class

File Format” on page 124.

Even more astute readers may be pondering some of the following questions.

• Is it possible to compile Java source code to some machine language

other than that of the JVM?

• Is it possible to compile some other high-level language to bytecode for

the JVM?

• Is there such a thing as an assembler for Java?

• What is the relationship between the Java language and bytecode?

The simple answer to the first three questions is yes.

It is possible with the appropriate compiler (generally referred to as a native

code compiler) to translate Java source code to any other low-level machine

code, although this rather defeats the Write Once, Run Anywhere proposition

for Java programs, since the resultant executable program will only run on the

platform for which it has been compiled.

It is also possible to compile other high-level languages into Java bytecode,

possibly via an interim step in which the source code is translated into Java

source code which is in turn compiled. Bytecode compilers already exist for

Ada, COBOL, BASIC and NetREXX (a dialect of the popular REXX

programming language).

Finally, Jasmin is a freely available Java assembler which allows serious

geeks to write Java code at a level one step removed from bytecode. Java

Grinder2 is a another freely available Java assembler and disassembler and

is very simple to use. Let’s consider the following Java code:



import java.io.*;

public class Count

{

public static void main(String[] args) throws Exception

{

int count=0;

if (args.length >= 1)

Figure 44. (Part 1 of 2). Count.java

2



Java Grinder can be downloaded from http://www-personal.umich.edu/~mcafee/java/.



120



Java 2 Network Security



{

FileInputStream fis = new FileInputStream(args[0]);

try

{

while (fis.read() != -1)

count++;

System.out.println("Hi! We counted " + count + " chars.");

} // try{} block ends

catch (Exception e)

{

System.out.println("No characters counted");

System.out.println("Exception caught" + e.toString());

} // catch(){} block ends

} // if block ends

else

System.err.println("Usage: Count file_name");

} // main() method ends

} // class Count ends

Figure 45. (Part 2 of 2). Count.java



We compile this code using the Java compiler:

javac Count.java



This command produces the Count.class file. This is a simple Java program

that counts the number of characters in a file. The file name is given as an

argument on the command line. If the Count program is able to count the

characters in the file, it prints the number of characters counted, and if not, it

prints the exception. We run this program against this sample text file, called

itso.txt:



Marco Pistoia

Duane Reller

Deepak Gupta

Milind Nagnur

Ashok Ramani

Figure 46. itso.txt



Both the Count.class and itso.txt files are stored in the same directory, say

D:\itso\ch05, and we launch the command:

java Count itso.txt



Class Files in Java 2



121



This is the output we receive:

Hi! We counted 70 chars



On disassembling the class file with the freely available software Java

Grinder, we get an output file, which is shown in the following figures:



public class Count extends Object {

public void () {

maxstack 1

aload_0

invokespecial void Object.()

return

}

public static void main(String[]) throws Exception {

maxstack 4

iconst_0

istore_1

aload_0

arraylength

iconst_1

if_icmplt label4

new FileInputStream

dup

aload_0

iconst_0

aaload

invokespecial void FileInputStream.(String)

astore_2

try // catch1

goto label2

label1:

iinc 1 1

label2:

aload_2

invokevirtual int FileInputStream.read()

iconst_m1

if_icmpne label1

getstatic PrintStream System.out

new StringBuffer

dup

ldc "Hi! We counted "

invokespecial void StringBuffer.(String)

iload_1

invokevirtual StringBuffer StringBuffer.append(int)

ldc " chars."

Figure 47. (Part 1 of 2). Disassembled Count.class File



122



Java 2 Network Security



invokevirtual StringBuffer StringBuffer.append(String)

invokevirtual String StringBuffer.toString()

invokevirtual void PrintStream.println(String)

catch Exception:label3

goto label5

astore_3

getstatic PrintStream System.out

ldc "No characters counted"

invokevirtual void PrintStream.println(String)

getstatic PrintStream System.out

new StringBuffer

dup

ldc "Exception caught"

invokespecial void StringBuffer.(String)

aload_3

invokevirtual String Throwable.toString()

invokevirtual StringBuffer StringBuffer.append(String)

invokevirtual String StringBuffer.toString()

invokevirtual void PrintStream.println(String)

goto label5

getstatic PrintStream System.err

ldc "Usage: Count file_name"

invokevirtual void PrintStream.println(String)

return



catch1:

label3:



label4:



label5:

}

}



Figure 48. (Part 2 of 2). Disassembled Count.class File



On assembling it again, we get the same functioning as the original class file.

Notice that even if someone changes your code by simply changing the

message:

Hi! We counted count chars



to something undesirable like:

Hi! Guess what else I did to this program



the result can be disturbing. It is possible to manipulate it even further and

add statements that can vary from serious things like reading files from your

system to merely annoying things like throwing up continuous messages.

Class files are most vulnerable when they are in transit along the information

superhighway. There are ways to help prevent or at least detect this

tampering. The Java 2 SDK provides tools for sealing classes in JAR files, as



Class Files in Java 2



123



we will see in 12.1.1, “Manifest File” on page 387 and 12.6, “The JAR Bug –

Fixed In Java 2 SDK, Standard Edition, V1.2.1” on page 461.

The following figure gives a pictorial model of how different languages, such

as COBOL, C++, NetREXX and Java, are compiled in different ways, as we

discussed in 5.1, “The Traditional Development Life Cycle” on page 117:



e

tiv

Na

COBOL

Source



r

ile

mp

Co



Bytecode

Compiler



Object

Module



Li



nk



Class

File

Executable

File



Link



C++

Source



Bytecode

Compiler



Object

Module



NetREXX

Source



Native Compiler

By

t

Co ecod

mp e

iler

Na

tiv

eC



om

pil

er



ad



Java

Source



Lo



nk

Li



Object

Module



Class

File



Load



Java

Virtual

Machine



ad

Lo



Class

File



Figure 49. Compiler Models



5.3 The Java 2 Class File Format

The class file contains a lot more information than its cousin, the executable

file. Of course, it still contains the same type of information: program

requirements, an identifier indicating that this is a program and executable

code (bytecode, in this case). However, it also contains some very rich

information about the original source code.



124



Java 2 Network Security



The high level structure of a class file is shown in the following table:

Table 2. Class File Contents



Field



Description



Magic number



Four bytes identifying this file as a Java class file. Always set to 0xCAFEBABE



JVM minor version



The minor version number of the JVM on which this class file is intended to run



JVM major version



The major version number of the JVM on which this class file is intended to run



Constant pool count



Number of entries in Constant Pool Table



Constant pool



See 5.4, “The Constant Pool” on page 129



Access flags



Mask of modifiers used with class and interface declaration



Class name



The name of this class



Super class name



The name of the superclass in the Java class hierarchy



Interfaces count



Number of direct super interfaces



Interfaces



Description of the interfaces implemented for this class



Fields count



Number of structures in the fields table



Fields



Description of the class variables defined for this class



Methods count



Number of structures in the methods table



Methods



Description of the methods declared by this class



Attributes count



Number of attributes in the attributes table



Attributes



Attributes associated with the class file



Much here is as we would expect. There is information to identify the file as a

Java class file, as well as the JVM on which it was compiled to run. In

addition, there is information describing the dependencies of this class in

terms of classes, interfaces3, fields, and methods. There is much more

information than this however, buried within the constant pool (see 5.4, “The

Constant Pool” on page 129): information which includes variable and method

names within both this class file and those on which it depends.

Let’s explain in more detail the fields listed in Table 2:

3

Each Java class has only a single superclass, and it inherits variables and methods from that superclass and all its

superclasses. This limitation makes the relationship between classes easy to understand and design, but it can also be

restrictive. To solve this problem, Java introduces the concept of interfaces, which collect method names (not

implementations) into one place, and then allow you to add those methods as a group to the various classes that need

them.



Class Files in Java 2



125



• The magic number is a hexadecimal number identifying the class format

and is always 0xCAFEBABE4.

• The values of minor version and major version are the minor and major

versions of the compiler that produced this class.

• The constant pool is a table of variable length structures representing

various string constants, class names, field names, and other constants

that are referred to.

• The access flag is a mask of modifiers used with the class and interface

declarations (for example, ACC_PUBLIC for public class or interface,

ACC_FINAL for a final class etc. – see 2.1.1.2, “Access to Classes, Fields

and Methods” on page 42).

• The interfaces field is an array of entries describing the interfaces

implemented by the class.

• The fields field is an array of entries describing the class variables

declared by this class or interface. It does not include those inherited.

• The methods field is an array of entries describing the methods declared

by this class or interface.

• The only attribute defined for the attributes table is SourceFile, which

indicates the name of the source file from which the class was created.

In addition to managing dynamic linking, the JVM must also ensure that class

files contain only legal bytecode and do not attempt to subvert the run-time

environment, and to do this, still more information is required in the class.

More details of how this works are in Chapter 6, “The Class Loader and Class

File Verifier” on page 145.

The main thing to understand at this point is that the inclusion of all of this

information makes the job of a hacker much simpler in many ways. We

discuss this in the next section.



5.3.1 Decompilation Attacks

One of the areas seldom discussed when considering security

implications of deploying Java is that of securing Java assets. Often

considerable effort is put into developing software and the resultant

intellectual property can be very valuable to a company.

Hackers are a clever (although potentially misguided) bunch and there are

many reasons why they might want to get inside your code. Here are a

few:

4



Just out of curiosity, 0xCAFEBABE corresponds to the decimal number 3405691582.



126



Java 2 Network Security



• To steal a valuable algorithm for use in their own code

• To understand how a security function works to enable them to bypass it

• To extract confidential information (such as hard-coded passwords and

keys)

• To enable them to alter the code so that it behaves in a malicious way

(such as installing Trojan horses or viruses)

• To demonstrate their prowess

• For their entertainment (much as other people might solve crosswords)

The chief tool in the arsenal of the hacker in these cases is the decompiler. A

decompiler, as its name suggests, undoes the work performed by a compiler.

That is, it takes an executable file and attempts to re-create the original

source code.

Advances in compiler technology now make it effectively impossible to go

from machine code to a high-level language such as C. Modern compilers

remove all variable and function names, move code about to optimize its

execution profile and, as was discussed previously, there are many possible

ways to translate a high-level statement into a low-level machine code

representation. For a decompiler, to produce the original source code is

impossible without a lot of additional information which simply is not shipped

in an executable file.

It is, however, very easy to recover an assembler language version of the

program. On the other hand, the amount of effort required to actually

understand what such a program does makes it far less worthwhile to the

hacker to do.5 So, it is fair to say that it is impossible to completely protect any

program from tampering.

When the Java Development Kit (JDK) 1.0.2 was shipped, a decompiler

named Mocha was quickly available which performed excellently. It was able

to recover Java source code from a class file. It was so successful that at

least one person used it as a way of formatting his source code! In fact the

only information lost in the compilation process (and unrecoverable using

Mocha) are the comments. However, if meaningful variable names are used

in the code (such as accountNumber, or password), then it is readily possible to

understand the function of the code, even without the comments.



5

Nevertheless, it is done. Much pirated software is distributed in a cracked format, that is, with software protection

disabled or removed.



Class Files in Java 2



127



Already, there are decompilers available, like SourceAgain6, which can

decompile Java codes including those programs written with the Java 2 SDK

using new APIs.

Here is what a test decompiler returned for the same Count.class file we used

in 5.2, “The Java Development Life Cycle” on page 119 (the originating

source code Count.java was shown in Figure 44 on page 120 and Figure 45

on page 121):



import java.io.FileInputStream;

import java.io.PrintStream;

public class Count

{

public static void main(String[] as) throws Exception

{

int i = 0;

if (as.length >= 1)

{

FileInputStream fileinputstream1 = new FileInputStream (as[0]);

try

{

while (fileinputstream1.read() != -1)

++i;

System.out.println ("Hi! We counted " + i + " chars.");

}

catch(Exception exception1)

{

System.out.println("No characters counted");

System.out.println("Exception caught" + exception1.toString());

}

}

else

System.err.println("Usage: Count file_name");

}

}

Figure 50. Decompiled Count.class



You can see that the code has been successfully decompiled. Only small

things like the name of the variables are changed.

There can be some advantages of having a decompiler:

1. Recovery of lost source code (by accident or otherwise)

6



See http://www.ahpah.com/product.html.



128



Java 2 Network Security



Xem Thêm
Tải bản đầy đủ (.pdf) (702 trang)

×