WebLogic troubleshooting: Diagnostic watch causing stuck threads.
In one of our SOA/BPM environments, the stability of the WebLogic server running the SOA and BPM Suite was not as it should be. There were problems deploying ear files and accessing the BPM workspace etcetera. Enterprise Manager Fusion Middleware control homepage would show that everything is running fine. But accessing the soainfra pages would fail.
In the AdminServer logs the following relevant entries were found. Apparently the AdminServer fails at communicating with the soa_server1 managed server.
####<Dec 19, 2013 3:36:28 PM CET> <Info> <JMX> <soasuite.whitehorses.nl> <AdminServer> <[STANDBY] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <ba62fa7fe259f172:-51af12ed:142dc08086b:-8000-000000000000607a> <1387463788064> <BEA-149506> <Established JMX Connectivity with soa_server1 at the JMX Service URL of service:jmx:t3://soasuite.whitehorses.nl:8001/jndi/weblogic.management.mbeanservers.runtime.>
<sniped/>
####<Dec 19, 2013 3:36:57 PM CET> <Error> <Socket> <soasuite.whitehorses.nl> <AdminServer> <ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'> <<WLS Kernel>> <> <ba62fa7fe259f172:-51af12ed:142dc08086b:-8000-000000000000609b> <1387463817690> <BEA-000403> <IOException occurred on socket: Socket[addr=soasuite.whitehorses.nl/10.42.193.15,port=8001,localport=54004]
weblogic.socket.MaxMessageSizeExceededException: Incoming message of size: '10000080' bytes exceeds the configured maximum of: '10000000' bytes for protocol: 't3'.
weblogic.socket.MaxMessageSizeExceededException: Incoming message of size: '10000080' bytes exceeds the configured maximum of: '10000000' bytes for protocol: 't3'
This would explain the problems in the EM Fusion Middleware control.
Checking Enterprise Manager Cloud Control, I saw some interesting things. There were 3 threads stuck in a ‘lock’ state. These are related to the workflow engine of the SOA Suite.
PS: The live thread analysis is part of the WebLogic Management Pack EE.
In the WebLogic soa managed server logfile there were stuck thread entries relating to the application requesting task information (tasks) from the workflow engine . Pages would timeout.
####<Dec 19, 2013 5:20:53 PM CET> <Error> <WebLogicServer> <soasuite.whitehorses.nl> <soa_server1> <[ACTIVE] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <ba62fa7fe259f172:-199d368f:1430b8589ce:-8000-000000000000518f> <1387470053650> <BEA-000337> <[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "631" seconds working on the request "Workmanager: default, Version: 2, Scheduled=true, Started=true, Started time: 631023 ms
[
POST /app/faces/home?_adf.ctrl-state=5mza8fbr5_4 HTTP/1.1
Accept: */*
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Adf-Rich-Message: true
Referer: http://soasuite.whitehorses.nl:8001/app/faces/home?_adf.no-new-window-redirect=true&_adf.ctrl-state=5mza8fbr5_4&_afrLoop=5893948270324306
Accept-Language: nl-NL
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
Content-Length: 590
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: JSESSIONID=tm4dSzYPL1StgcG4L9nWxTJhXl4Zs6p8bQxSlWVdYl5db1ZdBwLF!-113390690]", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
null>
Digging through MyOracleSupport, I found MOS note 1417194.1 on WebLogic Diagnostics Module configuration, which can cause the problems relating to the MaxMessageSizeExceededException messages we were seeing.
The following following piece from the note disturbed me a little:
In a production environment, it is not recommended to have diagnostic watches configured as we have seen them to cause problems in the past.
The solution is to disable the diagnostic watches.
scripted with WLST:
edit()
startEdit()
cd('/WLDFSystemResources/Module-FMWDFW/WLDFResource/Module-FMWDFW/WatchNotification/Module-FMWDFW')
cmo.setEnabled(false)
save()
activate()
After disabling the diagnostics watch, we started seeing stack traces filling up the server logfile of the SOA managed server. One error would generate 3 entries in the logfile, but those three entries accounted for more than 1000 lines of stackdumps! So these errors were triggering the diagnostic watch, which in itself would fail, causing the application threads to get stuck.
First log entry
####<Jan 9, 2014 8:51:12 AM CET> <Notice> <Stdout> <soasuite.whitehorses.nl> <soa_server1> <[ACTIVE] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <ba62fa7fe259f172:-5d99ff34:1437178ccb6:-8000-00000000000366ee> <1389253872864> <BEA-000000> <<Jan 9, 2014 8:51:12 AM CET> <Error> <oracle.soa.services.workflow.common> <BEA-000000> <<.>
ORABPEL-0
at oracle.bpel.services.workflow.repos.driver.WFTask.getWFTask(WFTask.java:2528)
at oracle.bpel.services.workflow.repos.driver.PersistencyService.getWFTask(PersistencyService.java:941)
at oracle.bpel.services.workflow.query.impl.TaskQueryService.queryTasks(TaskQueryService.java:898)
at sun.reflect.GeneratedMethodAccessor1801.invoke(Unknown Source)
<...sniped/>
Caused By: ORABPEL-0
at oracle.bpel.services.workflow.metadata.impl.TaskMetadataService.getTaskVisibilityRules(TaskMetadataService.java:1402)
at oracle.bpel.services.workflow.repos.driver.WFTask.getVisibilityRules(WFTask.java:5874)
at oracle.bpel.services.workflow.repos.driver.WFTask.setValuesFromResultSet(WFTask.java:4327)
at oracle.bpel.services.workflow.repos.driver.WFTask.setValuesFromResultSet(WFTask.java:3662)
at oracle.bpel.services.workflow.repos.driver.WFTask.getWFTask(WFTask.java:2522)
<..sniped/>
Caused By: ORABPEL-30084
Could not locate composite.
Could not locate composite for workflow component myPartition/bpmcomposite!1.30/T05_CHECK.
Ensure composite has been successfully deployed, and that the SOA server has completed loading composites.
at oracle.bpel.services.workflow.fabric.WorkflowServiceEngine.getComponentModel(WorkflowServiceEngine.java:2806)
at oracle.bpel.services.workflow.fabric.WorkflowServiceEngine.getTaskDefinition(WorkflowServiceEngine.java:2421)
at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getTaskDefinition(TaskMetadataCache.java:355)
at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getWorkflowConfiguration(TaskMetadataCache.java:619)
at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getTaskPrivilegeMetadata(TaskMetadataCache.java:916)
at oracle.bpel.services.workflow.metadata.impl.TaskMetadataService.getTaskVisibilityRules(TaskMetadataService.java:1339)
at oracle.bpel.services.workflow.repos.driver.WFTask.getVisibilityRules(WFTask.java:5874)
<sniped/>
>>
Second log entry
<ba62fa7fe259f172:-5d99ff34:1437178ccb6:-8000-00000000000366ef> <1389253872871> <BEA-000000> <<Jan 9, 2014 8:51:12 AM CET> <Error> <oracle.soa.services.workflow.task> <BEA-000000> <<.> Could not locate composite.
Could not locate composite for workflow component myPartition/bpmcomposite!1.30/T05_CHECK.
Ensure composite has been successfully deployed, and that the SOA server has completed loading composites.
ORABPEL-30084
Could not locate composite.
Could not locate composite for workflow component myPartition/bpmcomposite!1.30/T05_CHECK.
Ensure composite has been successfully deployed, and that the SOA server has completed loading composites.
at oracle.bpel.services.workflow.fabric.WorkflowServiceEngine.getComponentModel(WorkflowServiceEngine.java:2806)
at oracle.bpel.services.workflow.fabric.WorkflowServiceEngine.getTaskDefinition(WorkflowServiceEngine.java:2421)
at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getTaskDefinition(TaskMetadataCache.java:355)
at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getWorkflowConfiguration(TaskMetadataCache.java:619)
at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getTaskPrivilegeMetadata(TaskMetadataCache.java:916)
at oracle.bpel.services.workflow.metadata.impl.TaskMetadataService.getTaskVisibilityRules(TaskMetadataService.java:1339)
at oracle.bpel.services.workflow.repos.driver.WFTask.getVisibilityRules(WFTask.java:5874)
at oracle.bpel.services.workflow.repos.driver.WFTask.setValuesFromResultSet(WFTask.java:4327)
at oracle.bpel.services.workflow.repos.driver.WFTask.setValuesFromResultSet(WFTask.java:3662)
at oracle.bpel.services.workflow.repos.driver.WFTask.getWFTask(WFTask.java:2522)
at oracle.bpel.services.workflow.repos.driver.PersistencyService.getWFTask(PersistencyService.java:941)
at oracle.bpel.services.workflow.query.impl.TaskQueryService.queryTasks(TaskQueryService.java:898)
<sniped/>
>>
Third log entry
####<Jan 9, 2014 8:51:12 AM CET> <Notice> <Stdout> <soasuite.whitehorses.nl> <soa_server1> <[ACTIVE] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <ba62fa7fe259f172:-5d99ff34:1437178ccb6:-8000-00000000000366f0> <1389253872876> <BEA-000000> <<Jan 9, 2014 8:51:12 AM CET> <Error> <oracle.soa.services.workflow.common> <BEA-000000> <<.>
ORABPEL-0
at oracle.bpel.services.workflow.metadata.impl.TaskMetadataService.getTaskVisibilityRules(TaskMetadataService.java:1402)
at oracle.bpel.services.workflow.repos.driver.WFTask.getVisibilityRules(WFTask.java:5874)
at oracle.bpel.services.workflow.repos.driver.WFTask.setValuesFromResultSet(WFTask.java:4327)
at oracle.bpel.services.workflow.repos.driver.WFTask.setValuesFromResultSet(WFTask.java:3662)
at oracle.bpel.services.workflow.repos.driver.WFTask.getWFTask(WFTask.java:2522)
at oracle.bpel.services.workflow.repos.driver.PersistencyService.getWFTask(PersistencyService.java:941)
at oracle.bpel.services.workflow.query.impl.TaskQueryService.queryTasks(TaskQueryService.java:898)
<sniped/>
Caused By: ORABPEL-30084
Could not locate composite.
Could not locate composite for workflow component myPartition/bpmcomposite!1.30/T05_CHECK.
Ensure composite has been successfully deployed, and that the SOA server has completed loading composites.
at oracle.bpel.services.workflow.fabric.WorkflowServiceEngine.getComponentModel(WorkflowServiceEngine.java:2806)
at oracle.bpel.services.workflow.fabric.WorkflowServiceEngine.getTaskDefinition(WorkflowServiceEngine.java:2421)
at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getTaskDefinition(TaskMetadataCache.java:355)
at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getWorkflowConfiguration(TaskMetadataCache.java:619)
at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getTaskPrivilegeMetadata(TaskMetadataCache.java:916)
at oracle.bpel.services.workflow.metadata.impl.TaskMetadataService.getTaskVisibilityRules(TaskMetadataService.java:1339)
at oracle.bpel.services.workflow.repos.driver.WFTask.getVisibilityRules(WFTask.java:5874)
<sniped/>
>>
Here we see the workflow/getTask code again, which we actually also saw in the stuck thread information in Cloud Control. It is nice to see the pieces of the puzzle fit together.
This problem is related to an undeploy that had errors or failed somewhere (see MOS note 1597584.1). Some task metadata is still present in the SOA repository, which needs to be removed.
Delete the offending row from the wftaskmetadata table and restart the WebLogic managed server.
The errors are gone and the WebLogic server is stable again.
References
ORABPEL-30084: Could not locate composite for workflow component (Doc ID 1597584.1)
Overzicht blogs
Geen reacties
Geef jouw mening
Reactie plaatsenReactie toevoegen