We are having problem connecting to our Java applications running in Amazon's EC2 cluster.
It turns out that the problem was a combination of two missing settings. The first forces the JRE to prefer ipv4 and not v6. This was necessary (I guess) since we are trying to connect to it via a v4 address:
-Djava.net.preferIPv4Stack=true
The real blocker was the fact that JMX works by first contacting the RMI port which responds with the hostname and port for the JMX client to connect. With no additional settings it will use the local IP of the box which is a 10.X.X.X
virtual address which a remote client cannot route to. We needed to add the following setting which is the external hostname or IP of the server -- in this case it is the elastic hostname of the server.
-Djava.rmi.server.hostname=ec2-107-X-X-X.compute-1.amazonaws.com
The trick, if you are trying to automate your EC2 instances (and why the hell would you not), is how to find this address at runtime. To do that you need to put something like the following in our application boot script:
# get our _external_ hostname
RMI_HOST=`wget -q -O - http://169.254.169.254/latest/meta-data/public-hostname`
...
java -server
-Djava.net.preferIPv4Stack=true -Djava.rmi.server.hostname=$RMI_HOST
-jar foo.jar other parameters here > java.log 2>&1
The mysterious 169.254.169.254
IP in the wget
command above provides information that the EC2 instance can request about itself. I'm disappointed that this does not include tags which are only available in an authenticated call.
I initially was using the extern ipv4 address but it looks like the JDK tries to make a connection to the server-port when it starts up. If it uses the external IP then this was slowing our application boot time until that timed out. The public-hostname resolves locally to the 10-net address and to the public-ipv4 externally. So the application now is starting fast and JMX clients still work. Woo hoo!
Hope this helps someone else. Cost me 3 hours today.
To force your JMX server to start the server and the RMI registry on designated ports so you can block them in the EC2 Security Groups, see this answer:
How to close rmiregistry running on particular port?
Edit:
We just had this problem re-occur. It seems that the Java JMX code is doing some hostname lookups on the hostname of the box and using them to try to connect and verify the JMX connection.
The issue seems to be a requirement that the local hostname of the box should resolve to the local-ip of the box. For example, if your /etc/sysconfig/network
has HOSTNAME=server1.foobar.com
then if you do a DNS lookup on server1.foobar.com
, you should get to the 10-NET virtual address. We were generating our own /etc/hosts
file and the hostname of the local host was missing from the file. This caused our applications to either pause on startup or not startup at all.
Lastly
One way to simplify your JMX creation is to use my SimpleJMX package.