Friday, 23 April 2010

CONM6009E: The database is unable to get a connection to the database from DataSource

Another day, another test, another error!

We were running a stress test through our WAS systems that connect to an Oracle database on AIX. When we got a large number of concurrent requests we ended up getting the following error in our WAS logs:

CONM6009E: The database is unable to get a connection to the database from DataSource

We assumed at first that we had not sized our connection pools correctly. We turned on PMI and checked the size of the connection pools and found we weren't hitting the connection pool limits. We then checked the Oracle database which seemed correct but had logged a message stating the maximum user procs limit had been reached.

So on the DB server we ran the following:

lsattr -EH -l sys0 | grep -i maxuproc

which resulted in the following:

maxuproc 1024 Maximum number of PROCESSES allowed per user True

1024 was less than the total number of connection pool threads we had set in WAS. A quick chat with a friendly AIX administrator to increase this setting then resolved the issue.

Thursday, 15 April 2010

Testing WAS app without creating a session

Since writing a post (here) on in-memory session count, I have been doing endless amounts of work on sessions, tracing them to see how the reaper script works as well as how frequently it runs.

One of the big issues we were facing is the number of in memory sessions we were creating. Due to memory limitations and an app that was creating large sessions we have limited number of sessions available so understanding the ins and outs of session management has been useful.

In front of our IHS and WAS servers we had a load balancer that was firing a request through to the front screen of the logon to see if the application we up and running. Getting the load balancer to test a static page on the web servers wasn't sufficient for our requirements. Given the frequency of the LB requests though and the fact every time they accessed the front page they were allocating a session, it would mean we would often end up with overflowed sessions.

Instead of hitting the app front page we tried to hit a simple jsp within the app but then WAS would create a session for that request rather than anything explicit in the application. After a bit of digging I found a line of code I could add to a jsp

<%@page session="false" %>

This also means the stats I was producing in my previous post were more accurate and did not inclue the LB requests in the session count!

Friday, 9 April 2010

javax.net.ssl.SSLHandshakeException: bad certificate

We have been doing some testing on WAS recently where our app makes a call to a 3rd party which hosts some static images. In our test environments though we were getting a "bad certificate" error.

Our key stores and trust stores all appeard to have the valid certs in that we thought were reuqired. Unfortunately, even when we turned on tracing in WAS we couldn't see what the certificate was that was causing the issues.

Due to firewalls and proxies, we couldn't hit the url dirrect from a PC so we couldn't check it out manually. So to allows us to see what ceriticates were being served we used the openssl command which listed the certs served by the target site we were trying to hit:

/usr/linux/bin/openssl s_client -connect www.ourtargethost.com:443 -showcerts

Thisn showed the certifcate chain and the issues highlighted what the issues with the certs was:

CONNECTED(00000003)
depth=0 /C=GB/ST=Somewhere/L=Warrington/O=My company Ltd/OU=HS4/CN=www.ourtargethost.com
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 /C=GB/ST=Somewhere/L=Warrington/O=
My company Ltd/OU=HS4/CN=www.ourtargethost.com
verify error:num=27:certificate not trusted
verify return:1
depth=0 /C=GB/ST=Somewhere/L=Warrington/O=
My company Ltd/OU=HS4/CN=www.ourtargethost.com
verify error:num=21:unable to verify the first certificate
verify return:1

Wednesday, 31 March 2010

Monday, 8 March 2010

wsadmin and WAS commands hanging

In the last week we were having all sorts of problems getting any commands working even though they were running as root. I first notcied that when ever I treid to get into a wsadmin session, it would just hang.

There were no error messages and nothing obvious. We then discovered that all commands that end up running java under the covers were having the same issue, so startServer.sh , stopManager.sh , serverStatus.sh and pretty much all the supplied WAS scripts.

After an age looking around at the setupcmdline and seeing if the OSGI bundels were causing an issue. We also found a fix in fp29 that seemed in a similar area but that didn't resolve it. Just as we were about to log a call with IBM, we decided to take a javacore of the processes we were running whilst they were hung (why we didn't do this sooner I have no idea!)

We took several javacores, 30 seconds apart. Although there were no blocking threads, in each javacore, the main thread appeared to be looking up the localhost:

at java/net/Inet6AddressImpl.getLocalHostName(Native Method)
at java/net/InetAddress.getLocalHost(InetAddress.java:1463)

These same entried were in each of the javacores so it appeared there was an issue getting the localhost name. Once we spotted this, it didn't take long to find out there was an issue contacting our dns servers. We removed the /etc/resolv.conf whilst we looked into this, so WAS would now go back to using the hosts file on the server and everything then jumped back into life.

*Added 19th Mar 2010

I have just been directed to this page from IBM which may well have resolved my problem. If you can't simply turn off DNS then this might be a preferred option:

IBM link swg21170467

Monday, 1 March 2010

*sys-package-mgr*: can't create package cache dir

We hit another issue today after applying a fix pack to our WAS system, like the osgi bundle issue, it was due to file permissions.

When we were running some jython scripts, we were trying to import some packages but got an error:

from org.python.modules import re
WASX7015E: Exception running command: "from org.python.modules import re"; exception information:
com.ibm.bsf.BSFException: exception from Jython:
Traceback (innermost last):
File "", line 1, in ?
ImportError: no module named org


To recreate the error, rather than running this in a script I just started up a wsadmin session and on doing so I got the following error:

*sys-package-mgr*: can't create package cache dir, '/temp/cachedir/packages'

After a bit of investigation, it turned out the "cachedir" directory was owned as root but we run our scripts as a WASAdmin user. It looks like the permissions were changed to root after applying some fixpacks a couple of weeks ago, so a simple chown on cachedir resolved the issue.

Tuesday, 9 February 2010

Websphere in memory session count

I have recently been working on an application that uses in memory sessions. Our previous systems had always put session data on a database so it was easy to find out how many active sessions we had at any one time which is always useful when we have issues with a system so we can state how many users have been affected.

To allow us to view this sort of information within WAS, I first had to enable PMI on each WAS server.

This article discusses the overhead of PMI

Now that PMI was enabled, it was simply a case of writing a jython script to run at regular intervals to get the data:

The commands are as follows (I have taken out the commands to strip the data to the format I was specifically after so I will leave that up to your own jython skills to sort out)


servers = AdminTask.listServers( '[-serverType APPLICATION_SERVER]').splitlines()
for server in servers:
# Now just get the app server name - not the whole jytoh config id
newserver = server.split('(')
# get the session manager mbean
ps = AdminControl.queryNames ('WebSphere:type=SessionManager,process=' + newserver[0] + ',*')
# now get the stats for the mbean
AdminControl.getAttribute(ps, 'stats')


And hopefully you will get some output like this:

['', 'Stats name=My_WAR_FILE_NAME, type=servletSessionsModule', '{', 'name=SessionObjectSize, ID=18, description=The average size of the session objects at session level, including only serializable attributes in the cache., unit=BYTE, type=AverageStatistic, avg=1762.5, min=1713, max=1812, total=200925, count=114, sumSq=4.0370855625E10, type=TimeStatistic, avg=1762.5, min=1713, max=1812, total=200925, count=114, sumSq=4.0370855625E10', '}']

As well as the current count I could also check out the session object size which might also be useful if you have a large number of sessions and a small heap size

Sunday, 7 February 2010

WMSG1603E - An error occurred trying to read the bundle

I encountered a strange error this weekend whilst installing multiple fix packs across numerous WAS systems. Despite installing these fixes numerous times, this error only occurred on one system.

I was upgrading from WAS 6.1.0.21 to 6.1.0.27 - that included WAS, SDK, IHS and Plugin fixes.

After installing the fix packs when I restarted the server I got the following error:

WMSG1603E: An internal error occurred. It was not possible to register the WebSphere MQ JMS client with the application serve
r due to exception org.osgi.framework.BundleException: An error occurred trying to read the bundle

followed by a java stack which included the following:

WMSG1603E: An internal error occurred. It was not possible to register the WebSphere MQ JMS client with the application serve
r due to exception org.osgi.framework.BundleException: An error occurred trying to read the bundle

A quick search and the reason was obvious. This WAS system does not run as root but when I checked the file permissions on the org.osgi.framework bundles in {WAS_INSTALL_DIR}/profiles/{PROFILE_NAME}/configuration the bundle in question was owned by root:

drwxr-x--- 2 wasadm wasadm 256 07 Feb 10:25 org.eclipse.update
drwxr-xr-x 4 root system 256 07 Feb 10:26 org.eclipse.osgi
drwxr-x--- 3 wasadm wasadm 256 04 Jan 11:24 org.eclipse.core.runtime

A quick change of permissions on the directory and all sub directories followed by a restart and everything came up fine.

Wednesday, 27 January 2010

Test connection on each node for cell scope datasource

I am sure pretty much every WAS administrator has used the "test connection" button in the WAS console to prove a JDBC datasource has been set up correctly.

Although not a problem, something that always got me and didn't seem to be as good as it could be, is the fact that if you work in a large scale enironment you may well end up setting the datasource at a cell level as this would cut down on the time it takes to set up a datasource on each node or appserver and also redude the likelihood of something being mis typed. Hoever, if you do this and then run test connection, the connection is just from the dmgr as you can see by lookiig in the dmgr logs.

So what happens if you have 10, 20 or more nodes and you want to make sure that all of them can connect correctly to the database. You could telnet from each box to the DB server on the correct port, but that just shows network connectivity rather than a full databsee connection.

I assumed this could be done in a jython script but it took me a day or 2 to figure this out. If I connected to the nodeagent in wsadmin, I couldn't get the config details of the datasources as these are accessed from a wsadmin session connected to the dmgr. But running a test connection when in a wsadmin session connected to the dmgr just does the same as the "test connection" through the admin console.

In the end, I managed to write a simple unix script, which does the following:

1. Open a wsadmin session to the dmgr to get the datasource ids and write these to a file, passing in the name of the cell

2. open a wsadmin session to each nodeagent, read the datasource id's from the file, then run a test connection.


This is the basic unix script:

###########################################################################
CELL=epwsdr21Cell
NODES="epwsdr21 epwsdr22 epwsdr23 epwsdr24 epbtdr21"
PORT=8878

echo "Running connection from each node to the datasources in WAS"

echo "Connecting to dmgr through wsadmin....."


# Connect to the dmgr through wsadmin - pass in the name of the cell - and run scropt dsconnect.py

/usr/was6/WebSphere/AppServer/bin/wsadmin.sh -lang jython -f ./jython/dsconnect.py $CELL


echo "Connecting to each nodeagent to run test connections....."



for node in $NODES;do

echo "Connecting to ${node} on port ${PORT} through wsadmin"

# Connect to each nodeagent in my list of nodes above - and run script dsconnect2.py

/usr/was6/WebSphere/AppServer/bin/wsadmin.sh -lang jython -conntype SOAP -host $node -port $PORT -f ./jython/dsconnect2.py $node

done

echo 'Complete '


#############################################################################


And here is what is in the first jython script - dsconnect.py

# Jython script to get the datasource id's once connected to the dmgr through wsadmin


import sys
print ' '
print "Getting datasources for cell " + sys.argv[0]

# First build the cell name we are interested in fromm the cell name passed from the main script
constructcell ="/Cell:" + sys.argv[0] + "/"

# get the cell id

cellid = AdminConfig.getid( constructcell )

# Get the datasource id's

print 'Datasources found are listed below:'

# In this instance I am after v4 datasources for v5 datasources use "dsid = AdminConfig.list("DataSource", cellid).splitlines()"

dsid = AdminConfig.list("WAS40DataSource", cellid).splitlines()


print dsid

# Now open a tmp file and write the list of dsid's to tfe file - this will be a string rather than a jython list

f=open('/tmp/dsconnect.out','w')

s=str(dsid)

f.write(s)

f.close()

##############################################################################

So the list of ids is written to /tmp/dsconnect.out, now the second script is called for each nodeagent I want to connect to and then run a test connection. The list that has been put into a file will be seen as a jython string rather than a list which is why I use the eval statment so it goes back into string format


# Jython script to run a test connection on a list of datasources

import sys

#

# Open temp file to get string of datasource ids and assign to a jython list


f=open('/tmp/dsconnect.out')
test=f.read()
dsids=eval(test)
f.close()


for ds in dsids:

print ' '

print 'Testing connection from ' + sys.argv[0] + ' to ' + ds

try:

outp=AdminControl.testConnection(ds)

except:

print 'Error connecting to datasource'

print outp
else:

print outp

##############################################################################


The jython script doesn't show up as well as I would have thought on here. If you are after the scripts then drop me a mail info@janglestrategies.co.uk and I'll email them over

Thursday, 21 January 2010

Virtual Host matching - part 2

I have been doing a bit more in depth testing on a development system that I had the problems on to do with vhost matching:

This is my earlier post

I had just updated the "Virtual Host matching" to "physically use the port specified in the rquest", although this allowed the plugin to correctly match the ports by using the actual request in the URL rather than what was in the header record, once the request hit the application server, it went back to using the header record to match to a vhost or web group.

To ensure the app server also does the matching on the actual request rather than what is in the header, I updated the "Application Server Port preference" to "web server port" in the admin console: Servers > Web Servers > "server name" > plugin properties > Request and response

In terms of the plugin file, this now has the following AppServerPortPreference="WebserverPort" as well as VHostMatchingCompat="true"