Original PDF Flash format Mechanisms-for-Secure-Modular-Programming-in-Java  


Mechanisms For Secure Modular Programming In Java

Mechanisms for Secure Modular Programming in Java
Lujo Bauer
Andrew W. Appel
Edward W. Felten
Princeton University
November 12, 1999
Revised Version of Technical Report CS-TR-603-99
Abstract
system from malicious attack.
We present a new module system for Java that improves
The Java package system [GJS96] is a module sys-
upon many of the deficiencies of the Java package system
tem, but its notions of information hiding and access
and gives the programmer more control over dynamic
control leave much to be desired, especially in hostile
linking. Our module system provides explicit interfaces,
environments. Java packages have limited ability to
multiple views of modules based on hierarchical nesting,
control access to their member classes, they don’t have
and more flexible name-space management than the Java
explicit interfaces, and they don’t support multiple views
package system. Relationships between modules are ex-
of modules. These limitations make packages too weak
plicitly specified in module description files. We provide
to be used as an information-hiding mechanism.
more control over dynamic linking by allowing import
An additional problem confronts dynamically linked
statements in module description files to require that
programs: a piece of code is designed to behave properly
imported modules be annotated with certain properties,
only when its unresolved symbols are matched against
which we implement by digital signatures. Our module
the particular set of external objects with which the
system is compatible enough with standard Java that we
author intended his module to be linked [Car97]. But
have implemented it as a source-to-source and bytecode-
since linking is often not under the control of the
to-bytecode transformation wrapped around a standard
programmer who wrote the module—as in the Java
Java compiler, using a standard JVM.
virtual machine, for example—steps must be taken to
ensure that after linking a program will behave in a
manner consistent with the programmer’s intentions.
1 Introduction
Type checking guarantees that the types of symbols
in the interfaces between modules match, but it does
The traditional method of providing software-based
nothing else to ensure that the objects with which
protection within a program is by using abstract data
a program links will behave in the manner that the
types and information hiding. These methods have been
programmer expects.
used extensively to make sure that objects can be written
in ways that allow outsiders only carefully controlled
Some languages, such as Standard ML [MTH90] with
access to their implementation details.
its associated Compilation Manager [BA99], develop the
idea of module-level information hiding by providing the
We argue that the building blocks of today’s object-
facility for structuring modules hierarchically. Lower
oriented software systems, however, are not objects or
levels in a module hierarchy can communicate across
classes but modules. Modules must provide a framework
more expressive interfaces; higher levels can enforce
for information hiding and should help structure the
more restrictive ones.
interaction between different parts of a program. They
must do this not only to protect programs from non-
We present an ML-style hierarchical module system
malicious mistakes made by other parts of the same
that improves upon Java packages by providing explicit
software system, but also to protect the entire software
interfaces, multiple views of modules based on hierarchi-
1

current one. Any classes that are not listed in the export
Export Interface
Membership List
Import Interface
interface remain internal to the module, as if they were
declared package-scope.
The source files that comprise the module are listed
in the membership list. Every class that needs to be
Codegen.java
part of the module must be defined in one of these
source files. In this case, the membership list includes
the source file Codegen.java which defines the Codegen
class. Explicitly keeping track of the members of a
Access.java, AccessList.java, Frame.java, Proc.java
module is useful both from a software engineering and a
security standpoint.
Figure 1: The code-generation module from a compiler.
The only way to reference classes that are not in the
module is through the import interface, which introduces
new Java identifiers that are bound to external modules,
cal nesting, and more flexible name-space management.
packages1, or classes. In our example, the Codegen class
Building on this framework, we give the programmer
needs to reference class InstrList from the module
more control over what external modules his code can
located in directory ../Assem/. The import interface
be linked with.
We use digital code signing in a
therefore introduces the new identifier Assem and binds
more meaningful way than previous approaches. The
it to the required module. All the classes that are
details of the linking process remain abstract to the
listed in the export interface of this module can now
programmer, and the linking specifications are simple
be referenced by prefixing their names with Assem; e.g.,
and declarative.
Assem.InstrList.
The names of the identifiers and
Our module system is compatible enough with
directories listed in the import interface in this example
standard Java that we have implemented it as a
match only out of convenience; no feature of the module
source-to-source and bytecode-to-bytecode transforma-
system compels them to do so.
tion wrapped around a standard Java compiler and a
standard JVM.
3 Description
2 An Example
The source files in our module system are standard Java
source files, with a few exceptions:
Our module system, like Java packages, groups classes
into larger units. A module in our system consists of
• package and import declarations are omitted;
a set of source files and a module description file. The
module description file consists of three parts:
• the public and package-scope access modifiers in
class declarations are ignored (but these access
modifiers work as before for field and method
• an export interface;
declarations);
• a membership list;
• symbols defined in the module description file may
• a set of import statements.
be used in the source code as identifiers;
The example in Figure 1 shows what the code-
• references to classes external to the module are al-
lowed only via the identifiers defined in the module
generation module of a compiler might look like.
description file.
The export interface of the module is a filter that
allows only select classes to be visible externally. In
Java maintains separate name spaces for types, meth-
this example, the class Codegen is listed in the export
ods, and variables. The name space of types in which the
interface, which means that it can be accessed by other
1For compatibility with software written in Java that does not
modules, rather than just from the source code of the
use our module system.
2

source files are compiled and executed is composed of all
classAlias). A moduleAlias can be bound to a package
the classes that are defined within the current module
or module, a classAlias to a particular class or interface
and all classes imported through the module description
from a module or package that has already been assigned
file.
an alias. Import statements refer to modules by their
The syntax of module description files is given by this
locations. A location could be a directory or a URL,
grammar:
though our system currently supports only directories.
An import statement that binds a moduleAlias to a
Module | Library
directory may optionally require that the module be
[ classname | classAlias ] +
annotated with one or more properties (see Section 5).
is
The aliases introduced by the module description file
filename *
can be used in the source code of the module to reference
[ imports
classes from imported modules. The aliases may not
ImportStatement * ]
occur bound in the source code of the module.
ImportStatement:
Modules compile into JAR files, which can be digitally
moduleAlias packagename |
signed to ensure that their contents cannot be tampered
moduleAlias location [ property [ , property ]∗] |
with.
classAlias moduleAlias.classname
where classname is the simple name of a Java interface
4 Fixing Java Packages
or class (e.g., Codegen), filename is the name of a Java
source file, location is a relative or absolute pathname
Our module system contains a number of features
that must end with a path separator (e.g., ../Assem/),
that are missing or insufficiently developed in the Java
and moduleAlias and classAlias are Java identifiers.
package system. The most important are explicit export
interfaces and membership lists, hierarchical scalability
A typical module description file begins with the
and multiple interfaces, and convenient name-space
the keyword Module. The keyword Library indicates
management. These will be useful not only for software
that the JAR file (containing the compiled classes, the
engineering but will also enhance the security of software
module description file, and some extra information)
systems developed in Java.
produced by compiling the current module should in-
clude the compiled versions of all of the modules on
Export Interfaces and Membership Lists A well-
which the current one depends.2 Otherwise, it would
established principle of software engineering is that
contain the compiled versions of only the classes defined
the interface of a module should be separate from its
by the current module.
implementation. This enables a client of a module to
The keyword Module or Library is followed by a list
be written and type-checked against the interface before
of exported symbols. Each exported symbol is a class
the module’s implementation is written, and allows the
name, either from the set of locally defined classes or
module’s implementation to be type-checked against
one that has been imported and aliased in the imports
the same interface to ensure that the implementation
section of the module description file.
adheres to its own specification. Separating the interface
The keyword is concludes the list of exported symbols
from the implementation also aids in the construction of
and starts the enumeration of the classes that comprise
ADTs by making it clear which parts of the ADT form
the module.
its public interface and which should remain private.
The optional imports section can be used to establish
Some programming languages provide adequate sup-
bindings to any external classes that are to be visible
port for this model of programming. C [KR88] allows
to the source code of the module. Each import state-
the separation of interfaces from implementations and
ment introduces a new Java identifier (moduleAlias or
even the hiding of representations [Han96], though
without enforcing it as programming discipline. Modula-
2Our module system permits modules to import standard Java
packages. The linkage specifications and security features of our
3 [Nel91] and Standard ML [AM94] do a good job of
module system, however, do not apply to them.
both separating interfaces from implementations and
3

supporting ADTs. Java, in its native form, is lacking
Module
in both respects.
Graph
Java supports modular programming at both the class
Node
level and the package level. At the class level the
NodeList
is
interface facility of the language provides support for
Graph.java
the model of modular programming in which interfaces
FlowGraph.java
are separate from their implementations. It has some
Node.java
notable deficiencies, such as the inability to describe
NodeList.java
constructors or static methods, but, mainly, classes are
FlowNode.java
too fine-grained a structure to be particularly suitable
GraphUtils.java
as units of modularity for traditional modular program-
ming.
Figure 2: The module description file of a submodule of
Java uses the package mechanism to provide support
a register allocator.
for modularity above the class level. Java packages
do not separate interface from implementation – the
classes package-scope and letting clients access them
interface is derived implicitly from public keywords
only through public classes which filter out any unde-
sprinkled throughout the implementation.
sired uses of the private classes. The deficiencies of the
Aside from the traditional software engineering goals,
Java package system make this insecure. An attacker
module systems have recently been asked to fulfill addi-
could write a class that declared itself to be part of the
tional roles as well. With the widespread use of mobile
same package as the trusted application—this is possible
code (e.g., applets, plugins) it has become necessary
because Java packages don’t have membership lists—
to protect systems from damage that malicious mobile
which could then directly access the private classes of the
code might inflict, as well as to provide environments
trusted application, circumventing the filtering provided
in which mutually untrusted groups of mobile code can
by the public classes, and use them to malicious ends.3
run simultaneously but without danger of unwanted
Our module system prevents any such security breach
interaction. Since mobile applications (in Java) typically
by using module description files which explicitly specify
consist of several classes, it is natural that they be
both the membership of a module and its public interface
organized in modules. Even when this is not done
by listing all the classes that belong to each.
explicitly, a collection of classes that comprises a mobile
There are other ways of solving the security problem
application is likely to share the same set of security
posed by this example; for instance, by stack inspection
properties and will, from the standpoint of the system
[WF98]. A disadvantage of most of these schemes is
within which it is running, in many respects be treated
that they require dynamic run-time checking and that
as a de facto module. If mobile code systems are to rely
they are needlessly restrictive. Our scheme, on the other
on modules to organize code, it is important for module
hand, would prevent a hostile applet such as the one
systems to assist in providing the security functionality
described from even linking with the trusted application.
needed for mobile code, or at the very least not to
interfere with other mechanisms used to provide security.
The module description file in Figure 2 demonstrates
the use of explicit export interfaces and membership
The Java package system is unsuited for this role. The
lists. Only classes defined in the listed source files are
combination of implicit interfaces and the lack of explicit
considered to be part of the module. The module defines
membership lists makes it easy for a malicious attacker
several classes, but only Graph, Node, and NodeList are
to take advantage of a system for running mobile code
visible to clients outside the module.
that bases its security facilities on Java packages.
Though a significant improvement from the stand-
Let us consider an example. Suppose a particular
3
mobile application (i.e., package) is trusted by the
Since separate applets are normally loaded by different class
loaders they reside in different name spaces. For the attack to
system on which it is running. The application controls
work as described, the malicious class would have to be loaded by
access to its components by declaring certain sensitive
the same class loader that loaded the victim application.
4

point of information-hiding and program organization,
enough, so Java resorts to using a security manager
the interfaces of our module system don’t address the
to determine at run time whether a client is allowed
issue of separate compilation. The interfaces are merely
to access a particular restricted class. The security
lists of classes and do not describe their types, so an
manager suffers from a number of problems, from run-
implementation cannot be type-checked against them.
time overhead to its ability to interact only with the
They present an improvement over the Java package
owner of the virtual machine and not the executing
system’s implicit interfaces by allowing the programmer
program. Its complexity and ambiguities have made it
to specify the sets of classes that form a module and its
vulnerable to security breaches and made it difficult to
public interface. We do not see a suitably non-intrusive
reason about and form security policies [DFWB97].
way of adding support for separate compilation to Java,
Suppose that there are to be two views of module
but as our primary goal was to explore the security
M :
view
aspects of modular programming, we decided against
V1 providing access to classes A, B, C, and V2 pro-
extending and complicating our module system in an
viding A, D. In our module system this is accomplished
by making a module
attempt to solve this problem.
M0 containing (and exporting)
A, B, C, D; a module M1 that imports (and re-exports
Our approach to organizing modules is similar to,
via aliasing) M0.A, M0.B, M0.C; and a module M2 that
though simpler than, the mechanism for defining units in
imports and re-exports M0.A, M0.D. There are no
MzScheme [FF98], which does support separate compi-
wrapper classes: the class M2.A is the same class as
lation. But whereas the primary motivation in that work
M1.A.
is extensibility and code reuse, we are more concerned
with the security aspects of modular programming.
This is an instance of hierarchical modularity, which
is the idea of grouping several modules and attaching
to each group its own interface. The group is itself
Hierarchical Scalability and Multiple Interfaces
a module whose publicly visible members can be im-
The basic ways in which our modules support informa-
ported by other modules. The members of the group
tion hiding are not dissimilar from those offered by Java
can communicate among themselves through their own
packages. Java’s module interfaces are implicit; ours are
interfaces, which can be much less restrictive than the
explicit, but our interface descriptions consist only of
group’s top-level interface. This approach can be applied
classes, and don’t describe public fields and methods of
repeatedly to create a hierarchy of modules. For a
classes which are also part of a full interface. Though our
comprehensive treatment of hierarchical modularity see
module system is not powerful enough to fully describe
Blume and Appel [BA99]. We use a similar approach for
the types of modules, it makes it simpler to control and
Java.
enforce the visibility of member classes. The interfaces of
both systems have similar access control capabilities: a
Our module system supports hierarchical modularity
class can be either publicly visible or visible only to other
by allowing modules to explicitly list the sub-modules
classes inside the same module. The feature that sets our
on which they depend. Modules can export not only
module system off from Java packages, however, is the
classes that have been defined in their own source files,
ability to structure modules so as to provide different
but also classes that have been defined in imported
views to different clients.
modules. When its module description file begins with
the keyword
We often come across situations in which we would
Library, compiling a module produces a
JAR file that includes the bytecode of all the imported
like a module to export a richer interface to a few select
modules, which are then kept hidden by the export
modules and a more restrictive one to everyone else.
interface.
In a language like Standard ML a module can supply
different export interfaces to different clients. Modula-
Figure 3 is a module description file of the main mod-
3 also has that ability, though module interfaces in
ule of a compiler; it illustrates this approach. The main
Modula-3 may not overlap in the sets of members they
module imports all the sub-modules that implement
expose [Nel91]. Java’s methods of controlling accessi-
different parts of the compiler and defines only a few
bility (through making classes and their fields private,
classes that tie the sub-modules together into a working
protected, package-scope, or public) aren’t expressive
system. One of the modules it imports is Codegen, the
5

Library
clash between two classes called Parser, we might
Main
have a clash between two classes called Util.Parser.
is
The accepted way of solving this problem is to give
Main.java
packages long, unique names. This isn’t a particularly
NullOutputStream.java
appealing solution, however, since it interferes with the
imports
package system’s ability to provide convenient name-
Codegen ../Codegen/
space management; classes must now either be referred
RegAlloc ../RegAlloc/
to individually using their cumbersome package name
Absyn ../Absyn/
(e.g., java.awt.image.renderable.RenderableImage)
Tree ../Tree/
or be imported en masse using the * notation, which
...
again introduces the possibility of name clashes because
Types ../Types/
Util ../Util/
the names of the imported classes are stripped of their
unique package prefixes.
Figure 3: The module description file of the top-level
Our modules, on the other hand, are not named,
module of a compiler.
so they don’t suffer from this problem. Modules are
assigned names only via import statements of individual
module description files; this type of name-space thin-
code-generation module. Codegen defines and exports
ning makes it easy to keep their names short and simple.
the classes Access, AccessList, Codegen, Frame, and
In source code the names of external classes are prefixed
Proc. Though these are visible to the source code of the
with the name of their module, so name clashes between
top-level module, they are not publicly accessible. Only
classes with same names are easily avoided.
the class Main, the top-level interface to the compiler,
is left visible as the export interface of the group. The
The module system we present, of which the name-
hierarchical structure is transparent to a user; he has no
space management scheme is a part, is patterned upon
way of knowing that the compiler module is composed
the module system of Standard ML of New Jersey
of sub-modules.
[BA99]. Transplanting such a module system to Java
required some extensions over the SML-NJ module
Apart from the need for modules to support multiple
system. Java, for example, does not support in its core
interfaces, there is another reason for introducing hierar-
language the renaming of imported structures, this task
chical modularity. Windows 95 has over 10,000,000 lines
had to be passed on to the module system. The ability
of code [BP98]. If it were structured in just a two-level
of SML-NJ to provide fully-defined, rich interfaces is
framework of classes and modules, either there would
mostly a feature of the core language, and we could not
be more than 1,000 modules or each module would have
reproduce it in our module language without making it
more than 10,000 lines. This strongly suggests that a
undesirably complex. Our module system consequently
hierarchy of modules is necessary.
lacks separate compilation and SML’s powerful module-
level ADTs.
Name-Space Management An additional software
Writing secure applications in Java involves limiting
engineering benefit is our module system’s flexible and
the visibility of classes and preventing run-time inspec-
convenient name-space management scheme. Although
tion of objects by methods such as cloning, serialization,
the naming convention used with Java packages suggests
and deserialization [MF98b]. Our module system is a
that they support a hierarchical naming scheme, pack-
significant improvement over the Java package system
ages with names like java.awt and java.awt.color
in addressing the first issue.
have no more in common than packages with completely
different names.
5 Secure Linking
One of the reasons for grouping code into packages
is to avoid name clashes between classes. But Java
The behavior of a program fragment depends not only
packages are themselves named, so that merely lifts
on its own code but also on the libraries with which
the problem to the package level. Instead of a name
it is linked. Under the static linking model, compiling
6

and linking a piece of code generates an executable that
behave in the expected manner.
is fully self-contained. The libraries with which the
Stronger guarantees are needed, especially when a sys-
program is linked, as well as the finished product, are
tem must trust the behavior of a particular executable,
available for the programmer’s perusal. He therefore has
such as an applet. Java often uses code signing for such
good reason to expect that the self-contained executable
purposes [PD98, MF98a]. But what is the meaning
will behave in the desired manner, even if it is executed
of a signature on an applet? In Sun’s system, from
on a machine that has a different software environment
the signature of code C by key K
and a different set of libraries.
A we can reasonably
conclude that A signed C, and nothing more. We don’t
Today most executables aren’t fully self-contained,
know what properties A is claiming about C. However,
but need to dynamically link with system libraries when
code signing does provide a way to identify the author
they are executed. This provides us with the flexibility
of a piece of code, and thus to attribute blame after the
to update or change parts of all programs on a system
fact.
simply by swapping in a new module. Should we swap in
While providing some protection to the virtual ma-
a new I/O library to replace an old one, all executables
chine against code that runs on it, code signing provides
that use that library will automatically have access to
no guarantees to code about the virtual machine, nor
the updated code. If the executables were statically
to different code fragments about each other. Ironically,
linked, on the other hand, we would have to relink each
current code signing practices allow a programmer to
of them—inefficient and inconvenient, at best.
be held responsible for the behavior of his code, while
Dynamic linking has become very popular, especially
not providing him with the means of ensuring that the
with languages such as Java, which adopt it as a
system on which his code is running is itself behaving in
key feature [LY99]. But despite the proliferation of
the expected manner.
dynamic linking, only a few attempts have been made to
We allow the programmer to require certain properties
extend the model of correctness that holds for statically
of the modules on which his code depends.
If the
linked code [Dea97, Dea99]. Programmers believe that
required properties are not present, our system will
programs will behave in their intended manner even
not allow the program to link or execute. If they are
though much of the programs’ behavior depends on the
present, the programmer can more realistically expect
system libraries of foreign and unknown systems.
that his program, once linked, will behave in the desired
This belief is based mostly on the existence of stan-
manner. Furthermore, the programmer can annotate his
dards that seek to ensure the uniformity of library code
own module with certain guarantees which are held to be
(e.g., all Java virtual machines and their associated
valid once linking has succeeded. These annotations are
system classes are expected to meet Sun’s standard).
added to a module, and digitally signed, after it has been
There are very few guarantees, however, about adher-
compiled. We thus establish a system in which a module
ence to a standard that are expressed in a way that
can assert that if the modules it imports can guarantee
programs can understand. The guarantees are largely
certain behavioral properties, then it, too, will behave
implicit and informal or written in English, and can’t be
in a certain manner.
reasoned about or manipulated at the level of program
code. Additionally, standardization does not apply when
The properties our system supports are keywords
linking with third-party libraries.
The only widely
that represent statements made by an author about the
used method of ensuring safe linking, and the method
behavior of his code. Our property-annotation frame-
used by Java, is type-checking the interfaces between
work does not attempt to relate the claimed properties
program fragments. Recent research has formally shown
to actual program behavior, nor does it attempt to
that strongly typed mobile code has desirable security
classify properties or regulate their assignment. What
properties [LR98] and provided ways of ensuring that
we provide is a mechanism which allows statements
type safety is preserved by the linking process [GM99].
about program behavior to be mechanically attached to
Still, though type-checking is useful in ensuring that
modules and allows intermodule linking to be contingent
programs and libraries at least agree on the types they
upon the presence of such statements.
are using, it falls far short of guaranteeing that code will
A programmer, for example, may want a com-
7

piler that he is writing to have the property
Module
DoesNotPopUpAnyMisleadingDialogBoxes.
His com-
Main
piler, however, uses several third-party modules, one of
is
which is the parser module. The programmer does not
Main.java
have access to the source code of the parser; even if
NullOutputStream.java
observational evidence were to suggest that the parser
imports
behaves in the desired manner, there is no guarantee
...
that the compiler might not eventually be executed on
Parse ../Parse/ DoesNotPopUpAnyMisleadingDialogBoxes
a host where it would link with a different third-party
...
parser which might exhibit different behavior.
Figure 4: The module description file of the top-level
The module description file of the top-level module of
module of a compiler, annotated with additional linkage
his compiler (Figure 4) can specify that it should link
directives.
with the parser only if the parser is also annotated with
the DoesNotPopUpAnyMisleadingDialogBoxes prop-
the programmer to specify, prior to compilation, the
erty. If the parsing module is not annotated with that
locations of the modules on which his code depends.
property, the compiler will not link or execute. Now
Though class names do not have to be specified in the
it is reasonable to annotate the top-level module’s JAR
import interface, the locations of the modules, at least,
file with the DoesNotPopUpAnyMisleadingDialogBoxes
need to be known at compile time, which precludes some
property.
interesting uses of dynamic loading.
Our property tool will take a JAR file, property
name, and private key. It will cryptographically hash
6 Implementation
the < byte code, module description, property name >,
sign with the key, and add this certificate to the JAR
We have implemented a prototype that illustrates the
file. Thus, the JAR can accumulate certificates of the
features of our module system. Our prototype can be
form “key K says the module has property P .”
used with existing Java compilers and virtual machines.
A hierarchical module system is integral to our scheme
Our modules can be translated into Java packages.
of attaching properties to modules. Structuring modules
Some of the features of our module system, however—
in dependency graphs makes it possible for a top-level
in particular its ability to place various constraints on
module to unambiguously declare which properties it
linking—cannot be expressed just using Java bytecode.
requires of its subordinate modules in order to be able
Because of this, our prototype implementation needs to
to provide certain properties of its own. A hierarchically
provide additional features both to the compiler and to
built system also makes it much easier to reason about
the virtual machine.
the properties of modules by allowing the problem to
be subdivided into a number of smaller ones. Explicit
Compilation The compilation phase of our imple-
module descriptions are important to this scheme be-
mentation is a wrapper around a standard Java compiler
cause they provide a centralized framework for requiring
that consists of a preprocessing and a postprocessing
subordinate modules to hold certain properties.
step.
Our property and signature system is a small step in
The job of the preprocessing phase (Figure 5, trans-
the right direction; but we imagine that one might trust
form A) is to translate the source code used in our
certain signers for some properties and not others. We
module system into equivalent standard Java source
are working on a more powerful calculus of signers and
code. The first step of this process is to represent our
properties.
modules as Java packages. Each module is assigned an
Our use of explicit import interfaces restricts the
artificially generated package name, mapping the hierar-
flexibility of dynamic loading. In Java it is possible,
chical set of modules into a flat name space of packages.
at run time, to load classes whose names are unknown
We rely on the assignment of artificial package names
at compile time.
Explicit import interfaces require
to avoid name clashes. In addition to assigning each
8

t
C
r
Module description
t
t
Module
a
r
a
r
a
l
a
n
n
n
description
s
s
JAR
s
s
f
s
o
Java
byte
f
existing
o
file
The
f
o
L
r
o
JVM
m
source
code
r
m
Internet
r
m
Java
a
Modularized

d
A

compiler
B

C
e
Java source
r
Figure 5: The implementation of our system.
module a package name and adding appropriate package
module A
module B
module C
declarations to source files, this step must also translate
class references made through identifiers introduced in
the module description file (henceforth called symbolic
names) into class references that can be interpreted
...
...
by a Java compiler (henceforth called actual names).
Because identifiers in Java are classified into several
name spaces, and to detect and avoid conflicts with
...
...
locally bound identifiers, we have to parse the source
Util = B
Foo = C.Bar
...
...
code to determine which tokens need to be changed.
As qualified names from the original source code are
resolved by replacing identifiers introduced in module
Figure 6: Resolving class references.
description files with the package names of the modules
they represent, our compilation manager ensures that
the restrictions imposed by export interfaces and digital
There are cases, unfortunately, in which it is difficult
signature requirements are obeyed.
to restore a resolved identifier to its original name. A
particular module description file, for example, might
At this point our modules have been translated into
bind two different identifiers to the same class. Pre-
ordinary Java source code and can be compiled with
processing would replace the two different identifiers
any standard Java compiler, without the loss of any
with the same new one. After compilation, we might
functionality added by our module system.
not be able to discover which of the two is which. In
The compilation phase also has a post-compilation
this situation our compilation manager arbitrarily picks
step (Figure 5, transform B). Our modules can export
one and adds an annotation to the module’s JAR file.
symbols that have been defined in imported modules, so
This annotation can later be used to check whether the
it is possible that several module description files need
bindings that were used at compile time are still valid,
to be traversed to discover to which class a qualified
and otherwise warn that recompilation is necessary.
identifier is pointing. This resolved name is the one
used when the code is being compiled. Consequently,
Figure 6 shows an example of name rewriting. To re-
the bytecode of one module can depend on the source
solve the reference to Util.Foo, module A first consults
code of several; from a viewpoint that favors separate
its module description file to discover that the identifier
compilation, this is undesirable.
Util is bound to module B. From module B’s description
file we learn that class Foo is reexported rather than
To allow separate compilation of modules, we replace
defined in B, and that the real name of the class is
the resolved references in the compiled bytecode with
C.Bar.
The reference to Util.Foo is replaced by a
their symbolic names.
Thus all external references
reference to C.Bar. But since module C is part of the
are again made only through identifiers defined in the
hidden implementation of module B, it is possible that
module description files, releasing each compiled module
it may change after module A has been compiled. After
from unwanted dependencies on the source code of
compiling module A, therefore, the rewritten reference
others.
is returned to its original name, Util.Foo.
9

Execution in the Virtual Machine Dynamic link-
easily be misused. If a class loader, for example, was
ing in the Java virtual machine is managed by class
asked twice to fetch the same class and returned two
loaders. Class loaders were intended to be extensible to
different objects, the type system would be broken
allow the virtual machine to load bytecode from sources
and the security of the system would be compromised
other than the local file system. They can also be
[Dea97, Dea99].
Newer Java virtual machines have
modified, however, to support arbitrary mappings from
instituted stricter name-space management policies to
class names to objects, or even modify the bytecode of
guard against such breaches. [LB98]
the classes they load. These features makes them useful
The full name of every compiled class is encoded in
for adding advanced language features to Java without
its bytecode. Among other restrictions, new virtual
modifying the virtual machine. [AFM97]
machines verifiy that the encoded name of a class
Each module description file sets up a mapping from
returned in response to a loadClass request matches
identifiers to the classes they represent.
The same
the name with which loadClass was invoked. Class
identifier can therefore represent different classes in
names in our module system contain identifiers defined
different modules. A request to load a certain class,
in module description files; these names may bear little
too, may be allowed or denied depending on whether
relation to the actual package names assigned to the
the class is signed by the digital signature required by
classes they reference. With the new security checks, it is
the calling module. To deal with this issue, we have
no longer possible for our class loader to naively redirect
to provide the Java virtual machine with the ability to
loadClass requests to classes whose names don’t match
answer loadClass requests differently depending on the
the requested ones.
module from which they originate, which it otherwise
has no way of doing.
Our solution is to rewrite the bytecode of com-
piled modules, replacing symbolic names (those defined
Since loadClass requests are handled by the class
through module description files) with actual ones.
loader that loaded the class that is making the call,
This is done while a class is being loaded into the
our solution is to extend the ClassLoader class with
virtual machine, before linking or bytecode verification
the functionality we desire. We instantiate a new copy
(Figure 5, transform C). The procedure for resolving
of this class loader for every module that is loaded
symbolic names is virtually identical to the one we use
by the virtual machine.
Our class loader uses the
during preprocessing when source code is rewritten.
module description file to set up the appropriate class
environment and control linking in the manner specified
Since modules may reexport classes, resolving sym-
by export filters and digital signature requirements.
bolic names requires tracing through module description
After the virtual machine is initialized, a wrapper class
files to locate the module in which a given class is
loads our customized class loader, which then loads the
defined. This is necessary in order to find which package
modules to be executed.
name has been assigned to the module to which that
Each of our class loaders has direct access only to
class belongs. An unfortunate consequence, therefore,
its own module description file. When a class requests
of the bytecode rewriting is a slight restriction on the
that a class from a different module be fetched, the
laziness of dynamic linking. A Java virtual machine
requestor’s class loader passes the request to the appro-
might delay the loading and linking of a referenced class
priate module’s class loader. That class loader, in turn,
until the point of execution at which the class is actually
verifies whether the request can be fulfilled vis-`a-vis that
needed. Our rewriting technique, on the other hand,
module’s export interface and property requirements.
resolves all references at load time, so at that point it
If the requested class is merely being reexported, the
must access the module description files of all referenced
request will be passed on to the next class loader in the
modules. Since it doesn’t need to actually load classes
chain; otherwise, the requested class will be returned.
from the referenced modules, the chain of modules that
need to be accessed for a particular reference to be
resolved ends as soon as the module that defines the
Name Hacking The process we described for run-
referenced class has been found.
ning code written using our module system isn’t quite
complete. The ability to customize class loaders can
An alternative, simpler implementation might involve
10

changing the virtual machine to remove the security
7 Conclusions and Future Work
checks that make rewriting bytecode necessary. Care
would have to be taken, however, to prevent the security
Our module system is based on explicit module de-
problems against which these measures guard. Our
scriptions. Membership lists and explicit export in-
approach doesn’t involve modifying the Java virtual
terfaces protect module integrity. Unnamed modules
machine itself, which makes it portable across different
and declarative import statements provide simple and
implementations.
convenient name-space management. Variable levels of
access to modules are supported by arranging modules in
hierarchies. Increased control over the linking process,
implemented by allowing import statements to require
modules to have specific properties, helps ensure correct
The Reflection API Unsurprisingly, our system in-
program behavior in the presence of dynamic linking.
teracts badly with Java’s reflection API [Mic98]. The
purpose of the reflection API, including the
Any attempt to develop a secure programming en-
getName
and
vironment is likely to be based on a module system.
forName methods of java.lang.Class, is to discern
run-time information about classes that may not be
In the case of Java, a module system should provide
available at compile time. Regardless of our module
modularity at the level of Java packages, but should
system, it is dubious whether such a facility should be
also provide explicit interfaces, which Java packages do
available for use by untrusted applets. Though it is often
not. Explicit module descriptions seem to be a very
convenient for the programmer, use of the reflection
useful feature, both for providing an increased level of
API undermines the goals of programming with ADTs,
security and for simplifying the task of designing and
revealing information that may be purposefully hidden
understanding modular software systems. Class loaders
by subclassing and the use of Java
play a key role in security; our module system uses
interfaces.
them in a principled and declarative way to enforce
The security features of class loaders require that the
information-hiding. We have demonstrated that the
implementation of our naming scheme differ consider-
Java virtual machine is sufficiently powerful to support
ably from the view presented to the programmer. This
such an advanced module without modification.
effectively renders the forName and getName methods
The reflection API is a serious obstacle to sophisti-
useless. The former is used to create new instances of
cated module systems that support nesting and reex-
classes with a given name. But the name a programmer
porting and have opaque interfaces. Although there may
would use in source code has been changed during
be ways to limit the impact of the reflection API on
compilation and kept hidden. Even though a class loader
the security of such systems, the purpose of the API is
could resolve the requested name to its new version,
contradictory to the goals of using ADTs, and it would
the security restrictions placed on class loaders would
be preferable if its necessary features were provided in a
prevent it from returning the correct object. getName,
different way.
on the other hand, would reveal the internal names of
objects. Classes that form the interface of a module may
Dynamic linking is an area that deserves more study.
be either local or imported from elsewhere; revealing one
It is important to provide guarantees—ones that pro-
or the other would be a breach of security.
grams can reason about—about the behavior of dynam-
ically linked libraries. Only thus can we trust programs
Redirecting method calls from forName, getName,
that rely on them to behave in their intended manner.
and other methods of the reflection API to specialized
Our module system provides a good framework for
functions that would prevent certain information from
annotating code with such guarantees. We demonstrate
being revealed might restore most of the functionality
a method for allowing interrelated modules to require
of the API. It would require extensive bookkeeping
certain rudimentary properties of each other. We plan
and indirection, however, and would not be completely
to continue work on making these linking requirements
transparent to the user. For the time being we have
more expressive and giving modules even more control
decided to set aside concerns about reflection.
over the linking process.
11

References
[Han96]
David R. Hanson. C Interfaces and Implemen-
tations: Techniques for Creating Reusable Soft-
[AFM97]
Ole Agesen, Stephen N. Freund, and John C.
ware. Addison-Wesley Professional Computing
Mitchell. Adding type parameterization to the
Series. Addison-Wesley, 1996.
Java language. In Object Oriented Programing:
Systems, Languages, and Applications (OOP-
[KR88]
Brian W. Kernighan and Dennis M. Ritchie.
SLA), October 1997.
The C Programming Language. Prentice Hall
Software Series. Prentice Hall, 2nd edition, 1988.
[AM94]
Andrew W. Appel and David B. MacQueen.
Separate compilation for Standard ML. In ACM
[LB98]
Sheng Liang and Gilad Bracha. Dynamic class
Conference on Programming Language Design
loading in the Java Virtual Machine. ACM SIG-
and Implementation, pages 13–23, June 1994.
PLAN Notices, 33(10):36–44, October 1998.
[BA99]
Matthias Blume and Andrew Appel. Hierarchical
[LR98]
Xavier Leroy and Fran¸cois Rouaix.
Security
modularity. To appear in ACM Transactions on
properties of typed applets.
In Conference
Programming Languages and Systems, 1999.
Record of POPL ’98: The 25th ACM SIGPLAN-
SIGACT Symposium on Principles of Program-
[BP98]
Ronald Baecker and Blaine Price. The early
ming Languages, pages 391–403, 19–21 January
history of software visualization. In John Stasko,
1998.
John Domingue, Marc Brown, and Blaine Price,
editors, Software Visualization, chapter 2, pages
[LY99]
Tim Lindholm and Frank Yellin.
The Java
29–34. MIT Press, 1998.
Virtual Machine Specification. Addison Wesley,
2nd edition, 1999.
[Car97]
Luca Cardelli.
Program fragments, linking,
[MF98a]
Gary McGraw and Edward Felten. Securing
and modularization. In 24th ACM SIGPLAN-
Java: Getting Down to Business with Mobile
SIGACT Symposium on the Principles of Pro-
Code. John Wiley and Sons, 1998.
gramming Languages, pages 266–277, January
1997.
[MF98b]
Gary McGraw and Edward Felten.
Twelve
rules
for
developing
more
secure
Java
[Dea97]
Drew Dean. The security of static typing with
code.
Java
World,
December
1998.
dynamic linking. In Fourth ACM Conference
on Computer and Communications Security,
http://www.javaworld.com/javaworld/
Zurich, Switzerland, April 1997.
jw-12-1998/jw-12-securityrules.html.
[Mic98]
Sun Microsystems. Java core reflection. http://
[Dea99]
Richard Drews Dean. Formal Aspects of Mobile
java.sun.com/products/jdk/1.2/docs/guide/
Code Security. PhD thesis, Princeton University,
reflection/spec/java-reflection.doc.html,
January 1999.
1998.
[DFWB97] Drew Dean, Edward W. Felten, Dan S. Wallach,
[MTH90]
Robin Milner, Mads Tofte, and Robert Harper.
and Dirk Balfanz. Java security: Web browsers
The Definition of Standard ML. MIT Press,
and beyond. In Dorothy E. Denning and Peter J.
1990.
Denning, editors, Internet Beseiged: Countering
Cyberspace Scofflaws. ACM Press, October 1997.
[Nel91]
Greg Nelson, editor. Systems programming with
Modula-3. Prentice Hall Series in Innovative
[FF98]
Robert Bruce Findler and Matthew Flatt. Modu-
Technology. Prentice Hall, 1991.
lar object-oriented programming with units and
mixins. In Proceedings of the third ACM SIG-
[PD98]
Monica Pawlan and Satya Dodda.
Signed
PLAN International Conference on Functional
applets, browsers, and file access. Java Developer
Programming, pages 94–104, September 1998.
Connection, April 1998.
http://developer.
java.sun.com/developer/technicalArticles/
[GJS96]
James Gosling, Bill Joy, and Guy Steele. The
Security/Signed/index.html.
Java Language Specification. The Java series.
Addison-Wesley, 1996.
[WF98]
Dan S. Wallach and Edward W. Felten. Un-
derstanding Java stack inspection.
In IEEE
[GM99]
Neal Glew and Greg Morrisett. Type-safe linking
Symposium on Security and Privacy, May 1998.
and modular assembly language. In Conference
Record of POPL ’99: The 26th ACM SIGPLAN-
SIGACT Symposium on Principles of Program-
ming Languages, pages 250–261, January 1999.
12