Java

Java.net using Drupal

With the help of Cognisync, Sun Microsystems converted Java.net, the website of the Java community, to Drupal. The previous version of java.net, was custom built by O'Reilly Media. Interesting choice for a site devoted to Java, but needless to say, a great testimonial to Drupal.
Java net

Java performance evaluation through rigorous replay compilation

Our paper Java Performance Evaluation through Rigorous Replay Compilation (PDF, 1.9MB) has been accepted for publication at OOPSLA'08. This is joint work with Andy Georges and Lieven Eeckhout that I worked on before I got my PhD and left the university to start Acquia.

Good news because OOPSLA, which is short for ACM conference on object-oriented programming, systems, languages, and applications has incubated many state-of-the-art technologies, including design patterns, refactoring, aspect-oriented software development, dynamic compilation and optimization, the Unified Modeling Language, and more.

Paper abstract

A managed runtime environment, such as the Java virtual machine, is non-trivial to benchmark. Java performance is affected in various complex ways by the application and its input, as well as by the virtual machine (JIT optimizer, garbage collector, thread scheduler, etc.). In addition, non-determinism due to timer-based sampling for JIT optimization, thread scheduling, and various system effects further complicate the Java performance benchmarking process.

Replay compilation is a recently introduced Java performance analysis methodology that aims at controlling non-determinism to improve experimental repeatability. The key idea of replay compilation is to control the compilation load during experimentation by inducing a pre-recorded compilation plan at replay time. Replay compilation also enables teasing apart performance effects of the application versus the virtual machine.

This paper argues that in contrast to current practice which uses a single compilation plan at replay time, multiple compilation plans add statistical rigor to the replay compilation methodology. By doing so, replay compilation better accounts for the variability observed in compilation load across compilation plans. In addition, we propose matched-pair comparison for statistical data analysis. Matched-pair comparison considers the performance measurements per compilation plan before and after an innovation of interest as a pair, which enables limiting the number of compilation plans needed for accurate performance analysis compared to statistical analysis assuming unpaired measurements.

The impact of multicore architectures on software

After my internal PhD defense next Thursday, Michael Hind (a member of my PhD dissertation committee) will give a presentation about the impact of multicore architectures on software. Feel free to attend!

Abstract

Mainstream computer vendors have announced two dramatic changes in their future architectures. First, the clock speed and the amount of cache memory per processor will dramatically change. Namely, the exponential increase in clock frequency we've experienced over the past decades will cease to occur, and in some cases processor speeds will decrease. Also, the relative amount of cache memory for a processor will decrease. Second, there will be exponentially increasing number of processor cores on a chip.

These changes present two unprecedented challenges to the software stack:

  1. How does the software deal with the loss of single threaded performance and cache memory, and
  2. How does the software utilize the additional capabilities provided by multiple cores on a chip?

In his talk, Michael will argue why these challenges present great opportunities for software optimization and suggest some approaches to address these fundamental problems.

PhD dissertation milestone

I just sent the members of my PhD dissertation committee their copy of my dissertation. Yay!

Writing a PhD dissertation has been a completely new experience, and a challenging one for sure. Writing for weeks on end is hard. However, mailing your PhD dissertation to the members of your PhD dissertation committee is fun, and warrants a blog post.

The title of my PhD dissertation will be Profiling techniques for Java performance analysis and optimization. I'm fortunate to have two of the world's best Java people on my PhD dissertation committee -- James Gosling (the inventor of Java and Vice President at Sun Microsystems) and Michael Hind (Senior Manager of the Programming Technologies Department at IBM Research) -- so I'm looking forward to what they have to say about it.

Next up is my PhD defense. That and getting Drupal 6 out, of course.

Statistically Rigorous Java Performance Evaluation

The following paper has also been accepted for publication at OOPSLA 2007: Statistically Rigorous Java Performance Evaluation (PDF, 1.6MB).

Paper abstract

Java performance is far from being trivial to benchmark because it is affected by various factors such as the Java application, its input, the virtual machine, the garbage collector, the heap size, etc. In addition, non-determinism at run-time causes the execution time of a Java program to differ from run to run. There are a number of sources of non-determinism such as Just-In-Time (JIT) compilation and optimization in the virtual machine (VM) driven by timer-based method sampling, thread scheduling, garbage collection, and various system effects.

There exist a wide variety of Java performance evaluation methodologies used by researchers and benchmarkers. These methodologies differ from each other in a number of ways. Some report average performance over a number of runs of the same experiment; others report the best or second best performance observed; yet others report the worst. Some iterate the benchmark multiple times within a single VM invocation; others consider multiple VM invocations and iterate a single benchmark execution; yet others consider multiple VM invocations and iterate the benchmark multiple times.

This paper shows that prevalent methodologies can be misleading, and can even lead to incorrect conclusions. The reason is that the data analysis is not statistically rigorous. In this paper, we present a survey of existing Java performance evaluation methodologies and discuss the importance of statistically rigorous data analysis for dealing with non-determinism. We advocate approaches to quantify startup as well as steady-state performance, and, in addition, we provide the JavaStats software to automatically obtain performance numbers in a rigorous manner. Although this paper focuses on Java performance evaluation, many of the issues addressed in this paper also apply to other programming languages and systems that build on a managed runtime system.

Why PHP (and not Java)?

Almost every week or so, someone asks me: Why PHP? Apparently, you are doing Java too. So why not Java? Do you regret the fact that you wrote Drupal in PHP?

The answer?

No, I don't regret the choice of PHP. Both languages will get the job done, but Drupal's main target audience are not conservative verticals (government, healthcare, banking).

The web is built by millions of individuals, many of which are amateurs. They continuously update, tweak and rebuild their websites. Scripting languages like PHP lend themselves to that, and are widely available at affordable cost. Sun, on the other hand, failed to make Java accessible to amateurs.

It would have been very difficult to get critical mass if Drupal was written in Java.

© 1999-2009 Dries Buytaert Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.
Drupal is a Registered Trademark of Dries Buytaert.