icon-arrow icon-check icon-mail icon-phone icon-facebook icon-linkedin icon-youtube icon-twitter icon-cheveron icon-download icon-instagram play close close icon-arrow-uturn icon-calendar icon-clock icon-search icon-chevron-process icon-skills icon-knowledge icon-kite icon-education icon-languages icon-tools icon-experience icon-coffee-cup
Werken bij Integration & Application Talents
Blog 09/01/2014

WebLogic troubleshooting: Diagnostic watch causing stuck threads.

Diagnostic

In one of our SOA/BPM environments, the stability of the WebLogic server running the SOA and BPM Suite was not as it should be. There were problems deploying ear files and accessing the BPM workspace etcetera. Enterprise Manager Fusion Middleware control homepage would show that everything is running fine. But accessing the soainfra pages would fail.

failed_to_get

In the AdminServer logs the following relevant entries  were found. Apparently the AdminServer fails at communicating with the soa_server1 managed server.

####<Dec 19, 2013 3:36:28 PM CET> <Info> <JMX> <soasuite.whitehorses.nl> <AdminServer> <[STANDBY] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <ba62fa7fe259f172:-51af12ed:142dc08086b:-8000-000000000000607a> <1387463788064> <BEA-149506> <Established JMX Connectivity with soa_server1 at the JMX Service URL of service:jmx:t3://soasuite.whitehorses.nl:8001/jndi/weblogic.management.mbeanservers.runtime.>
<sniped/>
####<Dec 19, 2013 3:36:57 PM CET> <Error> <Socket> <soasuite.whitehorses.nl> <AdminServer> <ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'> <<WLS Kernel>> <> <ba62fa7fe259f172:-51af12ed:142dc08086b:-8000-000000000000609b> <1387463817690> <BEA-000403> <IOException occurred on socket: Socket[addr=soasuite.whitehorses.nl/10.42.193.15,port=8001,localport=54004]
weblogic.socket.MaxMessageSizeExceededException: Incoming message of size: '10000080' bytes exceeds the configured maximum of: '10000000' bytes for protocol: 't3'.
weblogic.socket.MaxMessageSizeExceededException: Incoming message of size: '10000080' bytes exceeds the configured maximum of: '10000000' bytes for protocol: 't3'

This would explain the problems in the EM Fusion Middleware control.

Checking Enterprise Manager Cloud Control, I saw some interesting things. There were 3 threads stuck in a ‘lock’ state. These are related to the workflow engine of the SOA Suite.

emcc_livethread_pool

PS: The live thread analysis is part of the WebLogic Management Pack EE.

In the WebLogic soa managed server logfile there were stuck thread entries relating to the application requesting task information (tasks) from the workflow engine . Pages would timeout.

####<Dec 19, 2013 5:20:53 PM CET> <Error> <WebLogicServer> <soasuite.whitehorses.nl> <soa_server1> <[ACTIVE] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <ba62fa7fe259f172:-199d368f:1430b8589ce:-8000-000000000000518f> <1387470053650> <BEA-000337> <[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "631" seconds working on the request "Workmanager: default, Version: 2, Scheduled=true, Started=true, Started time: 631023 ms
[
POST /app/faces/home?_adf.ctrl-state=5mza8fbr5_4 HTTP/1.1
Accept: */*
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Adf-Rich-Message: true
Referer: http://soasuite.whitehorses.nl:8001/app/faces/home?_adf.no-new-window-redirect=true&_adf.ctrl-state=5mza8fbr5_4&_afrLoop=5893948270324306
Accept-Language: nl-NL
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
Content-Length: 590
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: JSESSIONID=tm4dSzYPL1StgcG4L9nWxTJhXl4Zs6p8bQxSlWVdYl5db1ZdBwLF!-113390690]", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
null>

Digging through MyOracleSupport, I found  MOS note 1417194.1 on WebLogic Diagnostics Module configuration, which can cause the problems relating to the MaxMessageSizeExceededException messages we were seeing.

The following following piece from the note disturbed me a little:

In a production environment, it is not recommended to have diagnostic watches configured as we have seen them to cause problems in the past.

The solution is to disable the diagnostic watches.

em_disable_diag_watch

scripted with WLST:

edit()
startEdit()
cd('/WLDFSystemResources/Module-FMWDFW/WLDFResource/Module-FMWDFW/WatchNotification/Module-FMWDFW')
cmo.setEnabled(false)
save()
activate()

After disabling the diagnostics watch, we started seeing stack traces filling up the server logfile of the SOA managed server. One error would generate 3 entries in the logfile, but those three entries accounted for more than 1000 lines of stackdumps! So these errors were triggering the diagnostic watch, which in itself would fail, causing the application threads to get stuck.

First log entry

####<Jan 9, 2014 8:51:12 AM CET> <Notice> <Stdout> <soasuite.whitehorses.nl> <soa_server1> <[ACTIVE] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <ba62fa7fe259f172:-5d99ff34:1437178ccb6:-8000-00000000000366ee> <1389253872864> <BEA-000000> <<Jan 9, 2014 8:51:12 AM CET> <Error> <oracle.soa.services.workflow.common> <BEA-000000> <<.>
ORABPEL-0

        at oracle.bpel.services.workflow.repos.driver.WFTask.getWFTask(WFTask.java:2528)
        at oracle.bpel.services.workflow.repos.driver.PersistencyService.getWFTask(PersistencyService.java:941)
        at oracle.bpel.services.workflow.query.impl.TaskQueryService.queryTasks(TaskQueryService.java:898)
        at sun.reflect.GeneratedMethodAccessor1801.invoke(Unknown Source)
<...sniped/>

Caused By: ORABPEL-0

        at oracle.bpel.services.workflow.metadata.impl.TaskMetadataService.getTaskVisibilityRules(TaskMetadataService.java:1402)
        at oracle.bpel.services.workflow.repos.driver.WFTask.getVisibilityRules(WFTask.java:5874)
        at oracle.bpel.services.workflow.repos.driver.WFTask.setValuesFromResultSet(WFTask.java:4327)
        at oracle.bpel.services.workflow.repos.driver.WFTask.setValuesFromResultSet(WFTask.java:3662)
        at oracle.bpel.services.workflow.repos.driver.WFTask.getWFTask(WFTask.java:2522)
<..sniped/>

Caused By: ORABPEL-30084

Could not locate composite.
Could not locate composite for workflow component myPartition/bpmcomposite!1.30/T05_CHECK.
Ensure composite has been successfully deployed, and that the SOA server has completed loading composites.

        at oracle.bpel.services.workflow.fabric.WorkflowServiceEngine.getComponentModel(WorkflowServiceEngine.java:2806)
        at oracle.bpel.services.workflow.fabric.WorkflowServiceEngine.getTaskDefinition(WorkflowServiceEngine.java:2421)
        at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getTaskDefinition(TaskMetadataCache.java:355)
        at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getWorkflowConfiguration(TaskMetadataCache.java:619)
        at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getTaskPrivilegeMetadata(TaskMetadataCache.java:916)
        at oracle.bpel.services.workflow.metadata.impl.TaskMetadataService.getTaskVisibilityRules(TaskMetadataService.java:1339)
        at oracle.bpel.services.workflow.repos.driver.WFTask.getVisibilityRules(WFTask.java:5874)
<sniped/>
>>

Second log entry

<ba62fa7fe259f172:-5d99ff34:1437178ccb6:-8000-00000000000366ef> <1389253872871> <BEA-000000> <<Jan 9, 2014 8:51:12 AM CET> <Error> <oracle.soa.services.workflow.task> <BEA-000000> <<.> Could not locate composite.
Could not locate composite for workflow component myPartition/bpmcomposite!1.30/T05_CHECK.
Ensure composite has been successfully deployed, and that the SOA server has completed loading composites.

ORABPEL-30084

Could not locate composite.
Could not locate composite for workflow component myPartition/bpmcomposite!1.30/T05_CHECK.
Ensure composite has been successfully deployed, and that the SOA server has completed loading composites.

        at oracle.bpel.services.workflow.fabric.WorkflowServiceEngine.getComponentModel(WorkflowServiceEngine.java:2806)
        at oracle.bpel.services.workflow.fabric.WorkflowServiceEngine.getTaskDefinition(WorkflowServiceEngine.java:2421)
        at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getTaskDefinition(TaskMetadataCache.java:355)
        at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getWorkflowConfiguration(TaskMetadataCache.java:619)
        at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getTaskPrivilegeMetadata(TaskMetadataCache.java:916)
        at oracle.bpel.services.workflow.metadata.impl.TaskMetadataService.getTaskVisibilityRules(TaskMetadataService.java:1339)
        at oracle.bpel.services.workflow.repos.driver.WFTask.getVisibilityRules(WFTask.java:5874)
        at oracle.bpel.services.workflow.repos.driver.WFTask.setValuesFromResultSet(WFTask.java:4327)
        at oracle.bpel.services.workflow.repos.driver.WFTask.setValuesFromResultSet(WFTask.java:3662)
        at oracle.bpel.services.workflow.repos.driver.WFTask.getWFTask(WFTask.java:2522)
        at oracle.bpel.services.workflow.repos.driver.PersistencyService.getWFTask(PersistencyService.java:941)
        at oracle.bpel.services.workflow.query.impl.TaskQueryService.queryTasks(TaskQueryService.java:898)
<sniped/>
>>

Third log entry

####<Jan 9, 2014 8:51:12 AM CET> <Notice> <Stdout> <soasuite.whitehorses.nl> <soa_server1> <[ACTIVE] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <ba62fa7fe259f172:-5d99ff34:1437178ccb6:-8000-00000000000366f0> <1389253872876> <BEA-000000> <<Jan 9, 2014 8:51:12 AM CET> <Error> <oracle.soa.services.workflow.common> <BEA-000000> <<.>
ORABPEL-0
        at oracle.bpel.services.workflow.metadata.impl.TaskMetadataService.getTaskVisibilityRules(TaskMetadataService.java:1402)
        at oracle.bpel.services.workflow.repos.driver.WFTask.getVisibilityRules(WFTask.java:5874)
        at oracle.bpel.services.workflow.repos.driver.WFTask.setValuesFromResultSet(WFTask.java:4327)
        at oracle.bpel.services.workflow.repos.driver.WFTask.setValuesFromResultSet(WFTask.java:3662)
        at oracle.bpel.services.workflow.repos.driver.WFTask.getWFTask(WFTask.java:2522)
        at oracle.bpel.services.workflow.repos.driver.PersistencyService.getWFTask(PersistencyService.java:941)
        at oracle.bpel.services.workflow.query.impl.TaskQueryService.queryTasks(TaskQueryService.java:898)
<sniped/>

Caused By: ORABPEL-30084

Could not locate composite.
Could not locate composite for workflow component myPartition/bpmcomposite!1.30/T05_CHECK.
Ensure composite has been successfully deployed, and that the SOA server has completed loading composites.

        at oracle.bpel.services.workflow.fabric.WorkflowServiceEngine.getComponentModel(WorkflowServiceEngine.java:2806)
        at oracle.bpel.services.workflow.fabric.WorkflowServiceEngine.getTaskDefinition(WorkflowServiceEngine.java:2421)
        at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getTaskDefinition(TaskMetadataCache.java:355)
        at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getWorkflowConfiguration(TaskMetadataCache.java:619)
        at oracle.bpel.services.workflow.metadata.impl.TaskMetadataCache.getTaskPrivilegeMetadata(TaskMetadataCache.java:916)
        at oracle.bpel.services.workflow.metadata.impl.TaskMetadataService.getTaskVisibilityRules(TaskMetadataService.java:1339)
        at oracle.bpel.services.workflow.repos.driver.WFTask.getVisibilityRules(WFTask.java:5874)
<sniped/>
>>

Here we see the workflow/getTask code again, which we actually also saw in the stuck thread information in Cloud Control. It is nice to see the pieces of the puzzle fit together.

This problem is related to an undeploy that had errors or failed somewhere (see MOS note 1597584.1). Some task metadata is still present in the SOA repository, which needs to be removed.

Delete the offending row from the wftaskmetadata table and restart the WebLogic managed server.

The errors are gone and the WebLogic server is stable again.

References

weblogic.socket.MaxMessageSizeExceededException Appearing when Managed Server Attempting To Send Message Larger Than 60 MB To The Admin Server (Doc ID 1417194.1)

ORABPEL-30084: Could not locate composite for workflow component (Doc ID 1597584.1)

Overzicht blogs

Geen reacties

Geef jouw mening

Reactie plaatsen

Reactie toevoegen

Jouw e-mailadres wordt niet openbaar gemaakt.

Geen HTML

  • Geen HTML toegestaan.
  • Regels en alinea's worden automatisch gesplitst.
  • Web- en e-mailadressen worden automatisch naar links omgezet.

Wil je deel uitmaken van een groep gedreven en ambitieuze experts? Stuur ons jouw cv!