EM Console Slowness
and Stuck Thread Issue
Let me share my experience with EM (Enterprise Manager) console
which got installed as part of Oracle Fusion Middleware 11g Installation.
Before I start discussing the problem and solution in details, here are some brief about the environment which we were running –
Before I start discussing the problem and solution in details, here are some brief about the environment which we were running –
Item
|
Description
|
Domain
|
1 Admin Server and
22 Managed servers (includes 8 SOA mservers)
|
Machine
|
sun4v, 2 physical T4
servers having 12 logical LDOMs, 30 GB Memory and 4 CPU each LDOM. 2 Managed servers
installed on each LDOM just for 11g application.
|
OS
|
SunOS 5.10 64 bit
Installation
|
Database
|
11.2.0.2.0
|
FMW
|
11.1.1.6.
|
1 Problem
Statement:
In production when multiple users 15-20 start accessing the EM
console, then it’s start behaving badly, response was too slow, login was not
happening properly, was taking long time around 6-10 minutes, login time in all
attempt was not consistent. In few of attempts we were not able to login at
all, it was hanging during page loading process etc etc, for few attempts we
were able to login but then internal links e.g. Dashboard tab page, Instance
tab page, Fault and recovery tab page, most of the link were extremely slow.
At JVM level the threads which was getting initiated for EM
console was getting declared as Stuck threads in AdminServer.out file and after
few hours e.g. 4 hrs or so Admin server was getting crash since we have
overload configuration implemented for all the server in our environment where
we have defined if the Stuck thread counts increase more than 60 then server
should stop itself.
All the time, only one error message which we were able to see
in .out file is this –
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Referer:
http://:/em/faces/as/as/wlFarmHome?target=Farm_PROD1-SOA-domain&type=oracle_ias_farm&_afrLoop=2532193472836150&_afrWindowMode=0&_afrWindowId=null
Cookie:
JSESSIONID=1nfKQXdBkkPY4nNTxfM4ZLtX19BcsbpFy2nQ2KJ01t4KwpR4NQyb!-86916873;
s_pers=%20s_evar41%3D%255B%255B'e%253A21813dm%253Av%253Apers%253Acus%253Ambb%253Aredeemmodem%253A%253Aredeem-mega-modem-short-lnk'%252C'1303350152411'%255D%255D%7C1461202952411%3B%20s_prop19%3Df429da9da84831ca7aea17ae80f486d0%7C1334887942700%3B%20s_visit%3D1%7C1303353770759%3B%20s_prop25%3Dv%253Abus%253Acorporate-offer%7C1322794757110%3B%20s_nr%3D1322792957112%7C1325384957112%3B;
s_vi=[CS]v1|26D7C5CA85149F1B-600001670000530B[CE]; PRX-LB=185106698.36895.0000;
ADMINCONSOLESESSION=JxZXQXYXBnL2lpXCQZLphk56rm2Hj4jLTZx4yPnftNQPvkY4Lv9q!-86916873
Cache-Control: max-stale=0
Connection: Keep-Alive
X-BlueCoat-Via: C540871B14123F46
]", which is more than the configured time
(StuckThreadMaxTime) of "600" seconds. Stack trace: null>
Above error message was not giving detailed information that at
what steps EM is hanging, we have took several round of thread dump but were
not able to conclude root cause of failure.
We have raised this concern with Oracle, also we took the help
from all other accounts wherever FMW11g got implemented, SR run almost 4 month
to resolve this issue, multiple changes we have done in various deployment
windows and at last EM was stable, no login problem, always get logged in, all
the internal tab pages were working fine and fetching the data in expected
time.
2 Solution:
A series of changes
has been implemented which I am going to list out here –
2.1 Change1: Enable the “Data Display Option”
Enable the checkbox
for below options-
Disabled fetching of
instance and fault matrices count.
Restrict display of
instance and fault to the last 30 minute.
Initially we have reduced time for “restrict display
of instance and fault” from 24 hrs to 30min, later from 30min to 10min, and
then finally 10min to 0min. So during initial login it will not fetch any
records but instance can be pulled out on demand.
2.2 Change2: Increased cache timeout for discovery
During login process in Em console, EM does three things 1)
Authentication 2) Discovery of targets and 3) loading the page.
Discovery of target step was taking too long around 10 minute
since Domain was quite big and having a big list of targets.
Oracle note 1423893.1 has been implemented to cache the
discovery result, so the sub sequent login attempt will be fast - Following
given Mbeans attributes has been added into to improve performance –
oracle.sysman.emas.discovery.wls.FMW_DISCOVERY_MAX_CACHE_AGE
|
28800000
|
oracle.sysman.emas.discovery.wls.FMW_DISCOVERY_MAX_WAIT_TIME
|
30000
|
oracle.sysman.emas.discovery.wls.FMW_DISCOVERY_USE_CACHED_RESULTS
|
True
|
LargeRepository
|
True
|
All timing for mbeans
in milliseconds.
After implementing
above change, very first user login will take time after restart of Admin and
the discovery result will be stored in cache and sub sequent login attempt will
be fast.
2.3 Change3: increased the PermSize of Admin
server JVM
Max PERM size has been
increased from 512M to 1024M
MEM_PERM_SIZE_64BIT="-XX:PermSize=1024m”
2.4 Change4: Decrease the frequency of DMS
application
EM has one inbuilt application called Dynamic Monitoring System
(DMS) which does the status collection for all the targets wherever DMS got
deployed, if the frequency of DMS stats collection will be too fast then it
makes EM console to slow down. To increase the frequency here are the steps-
Increased below parameters in dms_config.xml file given at this
location –
$MiddlewareHome/fmw/soa11.1.1.6/oracle_common/modules/oracle.dms_11.1.1/server_config
prefetch intervalSeconds from 15 second to 120,
Discover intervalSeconds from 180 second to 300 seconds
### snap of last few
line of dms_config.xml
N.B.
Above highlighted
configuration was also suggested by Oracle but its was causing issue a lot and
got rollbacked after some time.
2.5 Change5: Un-Target DMS application from
unwanted servers.
In thread dumps which we took we realized that most of threads
are getting stuck at DMS application level and decided to disabled DMS
application for all servers.
In Weblogic Admin Console >> Domain>> Deployment
>> DMS >> Target tab page >> uncheck all servers.
Initially we have disabled the DMS application for entire SOA
cluster and OSB cluster, and observed huge significant improvement in logging
time, instead of taking 2min or 4 min, now EM was hardly taking 30 seconds to
login but internal links response time stand same as it was previously, but
login time got improved a lots.
But with this configuration we have notice lots of other new
issues where most of configuration in SOA-Infrastructure was not working, we
were not able to make any configuration changes e.g. under
FARM>>SOA>>SOA-INFRA>>SOA Administration>>Common
Properties, BPEL properties etc.
All the components e.g. BPEL, Mediator, human workflow, business
rule, fault and rejected message process everything got impacted once we
disabled the DMS application for SOA cluster.
Due to above impact we have to again enable the DMS application
for SOA servers alone.
Since DMS is quite bulky application which does stats collection
from all the targets where it got deployed, so it was advisable to disable DMS
for some of the targets where monitoring was not required to improve login
response in EM console.
Weblogic Admin Console >> Domain>> Deployment
>> DMS >> Target tab page >> check all SOA servers alone.
2.6 Change6: Unicast mode of communication has
been implemented for SOA and OSB cluster
Oracle note id 1437883.1 has
been followed to implement the same-
If you have large domain, having multiple clusters e.g. Weblogic
cluster, coherence cluster etc then preferred way for communication between all
the members in cluster should be unicast.
In our environment Weblogic clustering was already enabled in
Unicast mode but coherence cluster was using multicast mode and we have
identified this using string which was present in setDomain.env file –
-Dtangosol.coherence.clusteraddress=227.7.7.9
-Dtangosol.coherence.clusterport=9778 -Dtangosol.coherence.log=jdk
Above entries got
removed at domain level and for each managed server new configuration for
coherence cluster got added –
e.g. for
OSB servers and SOA servers–
-Dtangosol.coherence.wka1=hostname1
-Dtangosol.coherence.wka2=hostname2
-Dtangosol.coherence.localhost=
hostname1-app
-Dtangosol.coherence.localport=hostname1Port
-Dtangosol.coherence.wka1.port=hostname1Port
-Dtangosol.coherence.wka2.port=hostname2Port
Note: You can use same
port number for all hostname, also if you need more logging for coherence
component add this line in your setDomain.env file under JAVA_Option
-Dtangosol.coherence.log=stdout #
this will print details logs for members joining coherence cluster etc.
2.7 Change7: Disabled “BPEL recovery console”
option from Dashboard
Once you logged in into EM console, during dashboard page
loading, EM will try to fetch the data from “DLV_MESSAGE” tables for “invoke”
and “callback” activities which take longer time since DLV_ MESSAGE table
normally is quite huge.
In order to disable the same please perform this –
EM console>>Farm>>soa-infra>>Administration>>System
Mbeam Browser>> Filter>> Type bean name
"oracle.as.soainfra.config:name=soa-infra,*", >> In result
click on "AduitConfig" attribute and change the
"bpelRecoveryStatus=Off", default value for bpelRecoveryStatus is
"on"
N.B.
In above configuration
“O” should be capital letter of “Off”, small ‘o’ in ‘off’ will not work.
2.8 Change8: Added JVM parameter at domain level
“-XX:+UseMembar”
In threads dump Oracle has identified that few of threads
concurrently getting locked to one object, this mainly happening because they
observed whenever thread state is getting change its not notifying to others
waiting threads, to get rid from this situation above JVM parameter got added
###Snap of stack trace
which we have noticed in our thread dump
"[STUCK]
ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'"
daemon prio=3 tid=0x00000001067c4800 nid=0x47 waiting on condition
[0xffffffff5b4f9000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0xfffffffeca281c30> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
at oracle.adf.model.dcframe.DataControlFrameImpl.lock(DataControlFrameImpl.java:482)0xfffffffeca281c30>
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0xfffffffeca281c30> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
at oracle.adf.model.dcframe.DataControlFrameImpl.lock(DataControlFrameImpl.java:482)0xfffffffeca281c30>
2.9 Change9: Added JVM parameter at domain level
“-Dweblogic.management.disableManagedServerNotifications=true”
Above JVM parameter has been suggested by Oracle,
this parameter will help to reduce JMX notification which occur between Admin
and Managed servers whenever any new component get added, any state get changes
etc.
This configuration got added only for Admin Server Instance, not
at Manage server level, and we have notice huge benefit in EM after this
change, now EM was not at all hanging during login time, internal link were
slow but at least multiple users were able to login into EM.
2.10 Change10: Modified JVM parameter to print more
details information in Thread dump at Domain level–
Added the following Java option at
domain level for all servers. This will help to provide more representative
information in thread dumps which help during investigation..
“-XX:+PrintConcurrentLocks”
2.11 Change11: Patch applied in Environment
specific to EM Console Slowness Problem
1.
Patch Number 12972254,
download file name “p12972254_1036_Generic”
2.
Patch Number 14653405,
download file name “fmw_diagnostics_14653405”
3.
Patch Number 14750864,
download file name “p14750864_111160_Generic”
4.
Patch Number 14812586,
download file name “p14812586_111160_Generic”
5.
Patch Number 15853874,
download file name “p15853874_111160_Generic”
6.
Patch Number 15847755,
download file name “p15847755_111160_Generic”
N.B.
You might not get
above patches on Oracle Support website, since we got these patches directly
uploaded into SR level. Also, please confirm with Oracle Support before
deploying any PATCH into your environment, this given information is just for
reference and does not be supported by any Vendor.
2.12 Change12: Identification of EM Queries for
tunning purpose
Now after doing all sort of tunning at application level, login
problem got resolved but still internal links to instance tab page, fault and
rejected tab page and BPEL recovery console were taking long time to respond,
then we started looking into AWR report and found that couple of queries are
taking huge time around 20 minute as result EM is extremely slow for internal
links and was keep spinning.
We manage to get the EM queries which EM runs in the background
once users click on Instance, dashboard, fault and rejected message and BPEL recovery
console tab pages.
Here are the queries –
2.12.1DASHBOARD - BPEL Recovery console queries
SELECT * FROM (SELECT
/*+ FIRST_ROWS */ a.*, ROWNUM rnum FROM (SELECT MESSAGE_GUID AS a1,
DLV_TYPE AS a2, CIKEY AS a3, CLUSTER_NODE_ID AS a4, CLUSTER_NODE_KEY AS a5,
COMPONENT_NAME AS a6, COMPONENT_TYPE AS a7, COMPOSITE_LABEL AS a8,
COMPOSITE_NAME AS a9, COMPOSITE_REVISION AS a10, CONV_ID AS a11, DOMAIN_NAME AS
a12, ECID AS a13, EVENT_NAME AS a14, EXT_INT1 AS a15, EXT_STRING1 AS a16,
EXT_STRING2 AS a17, HEADER_PROPERTIES_BIN_FORMAT AS a18, HEADERS_REF_ID AS a19,
OPERATION_NAME AS a20, PARTNER_LINK AS a21, PRIORITY AS a22, PROPERTIES AS a23,
RECEIVE_DATE AS a24, RECOVER_COUNT AS a25, STATE AS a26, TENANT_ID AS a27,
CONV_TYPE AS a28, RES_SUBSCRIBER AS a29 FROM DLV_MESSAGE WHERE
((((COMPONENT_TYPE = ?) AND (STATE IN (?, ?))) AND (RECEIVE_DATE <= ?)) AND
(DLV_TYPE = ?)) ORDER BY RECEIVE_DATE DESC) a WHERE ROWNUM <= ?) WHERE rnum
> ?
bind
=> [bpel, 0, 1, 2012-10-31 12:07:36.044, 2,
1, 0]
SELECT * FROM (SELECT
/*+ FIRST_ROWS */ a.*, ROWNUM rnum FROM (SELECT MESSAGE_GUID AS a1,
DLV_TYPE AS a2, CIKEY AS a3, CLUSTER_NODE_ID AS a4, CLUSTER_NODE_KEY AS a5,
COMPONENT_NAME AS a6, COMPONENT_TYPE AS a7, COMPOSITE_LABEL AS a8,
COMPOSITE_NAME AS a9, COMPOSITE_REVISION AS a10, CONV_ID AS a11, DOMAIN_NAME AS
a12, ECID AS a13, EVENT_NAME AS a14, EXT_INT1 AS a15, EXT_STRING1 AS a16,
EXT_STRING2 AS a17, HEADER_PROPERTIES_BIN_FORMAT AS a18, HEADERS_REF_ID AS a19,
OPERATION_NAME AS a20, PARTNER_LINK AS a21, PRIORITY AS a22, PROPERTIES AS a23,
RECEIVE_DATE AS a24, RECOVER_COUNT AS a25, STATE AS a26, TENANT_ID AS a27,
MASTER_CONV_ID AS a28 FROM DLV_MESSAGE WHERE ((((COMPONENT_TYPE = ?) AND (STATE
IN (?, ?))) AND (RECEIVE_DATE <= ?)) AND (DLV_TYPE = ?)) ORDER BY
RECEIVE_DATE DESC) a WHERE ROWNUM <= ?) WHERE rnum > ?
bind
=> [bpel, 0, 1, 2012-10-31 12:07:36.044, 1,
1, 0]
2.12.2Dashboard - Recent fault and rejected messages
SELECT * FROM (SELECT
/*+ FIRST_ROWS */ a.*, ROWNUM rnum from (SELECT f.CIKEY, f.NODE_ID,
f.SCOPE_ID, f.COUNT_ID, f.FAULT_NAME, f.FAULT_TYPE, f.POLICY_NAME,
f.POLICY_VERSION, f.POLICY_CATEGORY, f.POLICY_ECID, f.CREATION_DATE,
f.MODIFY_DATE, f.MESSAGE, wi.LABEL, wi.CREATOR, wi.MODIFIER, wi.STATE, ci.ECID,
ci.CMPST_ID, ci.DOMAIN_NAME, ci.COMPOSITE_NAME, ci.COMPONENT_NAME,
ci.COMPOSITE_REVISION, ci.COMPOSITE_LABEL FROM CUBE_INSTANCE ci, WI_FAULT f
LEFT JOIN WORK_ITEM wi ON f.CIKEY = wi.CIKEY AND f.NODE_ID = wi.NODE_ID AND
f.SCOPE_ID = wi.SCOPE_ID AND f.COUNT_ID = wi.COUNT_ID WHERE ci.CIKEY
= f.CIKEY AND ci.COMPONENTTYPE = ? AND (f.FAULT_TYPE is null OR
f.FAULT_TYPE = ?) AND ci.STATE != 9 AND (wi.STATE is null OR wi.STATE IN (9, 4,
13, 3)) ORDER BY f.CREATION_DATE DESC) a where ROWNUM
< ? ) where rnum > ?
bind
=> [bpel, {http://schemas.oracle.com/bpel/extension}RuntimeFaultMessage, 5,
0]
2.12.3Dashboard - Recent composite Instance queries –
SELECT /*+
FIRST_ROWS(40) */ ID, CONVERSATION_ID, HAS_ASSOC, PARENT_ID, UPDATED_TIME,
CREATED_BY, ECID, TITLE, TEST_RUN_NAME, INDEX5, TEST_RUN_ID, INDEX3, TEST_SUITE,
INDEX1, TEST_CASE, BATCH_INDEX, SOURCE_NAME, COMPOSITE_DN, SOURCE_TYPE,
CREATED_TIME, SOURCE_ACTION_TYPE, INDEX6, SOURCE_ACTION_NAME, INDEX2, STATE,
BATCH_ID, LIVE_INSTANCES, TAGS, STATE_COUNT, BUSINESS_STATUS, VERSION, INDEX4,
PARTITION_DATE, UPDATED_BY, TENANT_ID FROM COMPOSITE_INSTANCE WHERE
(CREATED_TIME >= :1 ) ORDER BY CREATED_TIME DESC
We got this query from
AWR report and our guess is it’s being used for “recent instance” section,
2.12.4Instance tab page -
Again our guess is
same query which is getting used for Dashboard tab page instance section is
getting used for "Instance" tab page as well
SELECT /*+
FIRST_ROWS(40) */ ID, CONVERSATION_ID, HAS_ASSOC, PARENT_ID, UPDATED_TIME,
CREATED_BY, ECID, TITLE, TEST_RUN_NAME, INDEX5, TEST_RUN_ID, INDEX3, TEST_SUITE,
INDEX1, TEST_CASE, BATCH_INDEX, SOURCE_NAME, COMPOSITE_DN, SOURCE_TYPE,
CREATED_TIME, SOURCE_ACTION_TYPE, INDEX6, SOURCE_ACTION_NAME, INDEX2, STATE,
BATCH_ID, LIVE_INSTANCES, TAGS, STATE_COUNT, BUSINESS_STATUS, VERSION, INDEX4,
PARTITION_DATE, UPDATED_BY, TENANT_ID FROM COMPOSITE_INSTANCE WHERE
(CREATED_TIME >= :1 ) ORDER BY CREATED_TIME DESC
2.12.5Instance tab page search query based on name vs title-
SELECT /*+
FIRST_ROWS(50) */ ID, CONVERSATION_ID, HAS_ASSOC, PARENT_ID, UPDATED_TIME,
CREATED_BY, ECID, TITLE, TEST_RUN_NAME, INDEX5, TEST_RUN_ID, INDEX3,
TEST_SUITE, INDEX1, TEST_CASE, BATCH_INDEX, SOURCE_NAME, COMPOSITE_DN,
SOURCE_TYPE, CREATED_TIME, SOURCE_ACTION_TYPE, INDEX6, SOURCE_ACTION_NAME,
INDEX2, STATE, BATCH_ID, LIVE_INSTANCES, TAGS, STATE_COUNT, BUSINESS_STATUS,
VERSION, INDEX4, PARTITION_DATE, UPDATED_BY, TENANT_ID FROM COMPOSITE_INSTANCE
WHERE (TITLE LIKE :1 ) ORDER BY CREATED_TIME DESC
2.12.6Fault and Rejected message tab page –
For this tab page we
could see one parent query runs which fetch initial 40 records and then for
each records its run another child sql query-
2.12.6.1 Parent query-
SELECT * FROM (SELECT
/*+ FIRST_ROWS */ a.*, ROWNUM rnum from (SELECT f.CIKEY, f.NODE_ID,
f.SCOPE_ID, f.COUNT_ID, f.FAULT_NAME, f.FAULT_TYPE, f.POLICY_NAME,
f.POLICY_VERSION, f.POLICY_CATEGORY, f.POLICY_ECID, f.CREATION_DATE,
f.MODIFY_DATE, f.MESSAGE, wi.LABEL, wi.CREATOR, wi.MODIFIER, wi.STATE, ci.ECID,
ci.CMPST_ID, ci.DOMAIN_NAME, ci.COMPOSITE_NAME, ci.COMPONENT_NAME,
ci.COMPOSITE_REVISION, ci.COMPOSITE_LABEL
FROM CUBE_INSTANCE ci,
WI_FAULT f LEFT
JOIN WORK_ITEM wi ON
f.CIKEY = wi.CIKEY AND f.NODE_ID = wi.NODE_ID AND f.SCOPE_ID = wi.SCOPE_ID AND
f.COUNT_ID = wi.COUNT_ID WHERE ci.CIKEY = f.CIKEY AND
ci.COMPONENTTYPE = ? AND ci.STATE != 9 AND (wi.STATE is null OR wi.STATE IN (9,
4, 13, 3)) ORDER BY f.CREATION_DATE DESC) a where ROWNUM
< ? ) where rnum > ?
bind
=> [bpel, 40, 0]
2.12.6.2 Child query –
SELECT F.CIKEY,
F.NODE_ID, F.SCOPE_ID, F.COUNT_ID, F.FAULT_NAME,
F.FAULT_TYPE,F.CREATION_DATE,F.MODIFY_DATE,
F.MESSAGE,
WI.LABEL, WI.CREATOR,WI.MODIFIER,WI.STATE,CI.ECID,CI.CMPST_ID
FROM
WI_FAULT F LEFT JOIN WORK_ITEM WI ON F.CIKEY = WI.CIKEY
AND
F.NODE_ID = WI.NODE_ID AND F.SCOPE_ID = WI.SCOPE_ID AND F.COUNT_ID =
WI.COUNT_ID
LEFT
JOIN CUBE_INSTANCE CI ON F.CIKEY = CI.CIKEY WHERE F.CIKEY = ? AND F.NODE_ID = ?
AND
F.SCOPE_ID = ? AND F.COUNT_ID = ?
bind
=> [2050981, BpInv1, BpSeq3.17, 2]
SELECT F.CIKEY,
F.NODE_ID, F.SCOPE_ID, F.COUNT_ID, F.FAULT_NAME,
F.FAULT_TYPE,F.CREATION_DATE,F.MODIFY_DATE,
F.MESSAGE,
WI.LABEL, WI.CREATOR,WI.MODIFIER,WI.STATE,CI.ECID,CI.CMPST_ID
FROM
WI_FAULT F LEFT JOIN WORK_ITEM WI ON F.CIKEY = WI.CIKEY
AND
F.NODE_ID = WI.NODE_ID AND F.SCOPE_ID = WI.SCOPE_ID AND F.COUNT_ID =
WI.COUNT_ID
LEFT
JOIN CUBE_INSTANCE CI ON F.CIKEY = CI.CIKEY WHERE F.CIKEY = ? AND F.NODE_ID = ?
AND
F.SCOPE_ID = ? AND F.COUNT_ID = ?
bind
=> [2050969, BpInv6, BpSeq19.39, 2]
2.13 Change13: Created following Index to improve
the EM queries execution response time–
Index Creation
|
||
Table Name
|
Column name
|
Index type
|
COMPOSITE_INSTANCE
|
created_time
|
Normal Index
|
CUBE_INSTANCE
|
CPMST_ID
|
Normal Index
|
COMPOSITE_INSTANCE
|
TITLE
STATE
|
Normal Index
Normal Index
|
DLV_MESSAGE
|
STATE
DLV_TYPE
|
Composite Index
|
Change14: Explored direct links to access the EM console Instance
tab page, fault and recovery tab page, dashboard tab etc.
If we login into EM console via normal login procedure, it takes
quite a long time to navigate from Dashboard tab page to Instance tab page etc.
To avoid this delay we have provide direct link for accessing these page based
on user need-
2.13.1Instance Tab Page
&type=oracle_soainfra&selectedTab=instancesTab
N.B.
In above string replace hostname, domain name, port number and
target value to prepare the URL for your environment, target name can be pulled
out from this path- right click on “soa-infra(mservername) >> click to
general >> a dialogue box will appear, copy the target name from there
and replace in above string. Refer the above screen shot for path.
2.13.2Fault and Recovery Tab
page
&type=oracle_soainfra&selectedTab=faultsTab
2.13.3Dashboard Tab Page
&type=oracle_soainfra&selectedTab=compositeTabs
2.14 Change 15: Manually execution of Stats pack at
Database level for SOA_INFRA partition tables.
One more major impacting solution we have adapted to auto
schedule stats collection process for partition tables.
During our analysis of AWR report, we have identified the long
running queries and also gone through the explain plan of those query and concluded
that SQL optimizer is not working as expected and possible reason behind that
automatic database stats collection process updates the latest partition of
SOA_INFRA tables where as the old partitions SOA_INFRA tables remains
unchanged.
When EM runs the queries it gets two different explain plans
according to the data exist into separate partition tables as result it take
longer time to response.
In order to overcome
this situation we have implemented manual stats collection on Tables level,
rather than having stats collection at partition level for table. A cron job
now run every mid night which does table scan to identify data changes %, if
more than 5% data got changed in table it does the stats collection for that
table. SOA_INFRA table list where stats collection process been schedule is
given below-
2.14.1 Table List where DB Manual Stats collection is required -
TABLES PARTITIONED
|
AUDIT_COUNTER
|
AUDIT_DETAILS
|
AUDIT_TRAIL
|
CI_INDEXES
|
COMPOSITE_INSTANCE
|
COMPOSITE_INSTANCE_ASSOC
|
COMPOSITE_INSTANCE_FAULT
|
CUBE_INSTANCE
|
CUBE_SCOPE
|
DLV_MESSAGE
|
DLV_SUBSCRIPTION
|
DOCUMENT_CI_REF
|
DOCUMENT_DLV_MSG_REF
|
HEADERS_PROPERTIES
|
REFERENCE_INSTANCE
|
WI_FAULT
|
WLI_QS_REPORT_ATTRIBUTE
|
WORK_ITEM
|
XML_DOCUMENT
|
2.14.2 Call the below SQL statement to forcefully
initiate DB stats collection for a table –
EXEC
DBMS_STATS.gather_table_stats('SOAP_SOAINFRA', 'AUDIT_COUNTER',estimate_percent
=> 5);
Conclusion
Ultimately EM Stuck Thread problem and slowness problem got
resolved; it took around 3 + months in our production environment to do all the
changes in multiple deployment windows, now multiple users are accessing EM
console and able to do their BAU work. Only regular purging is required for
SOA_INFRA database to keep EM console stable in future as well.
Thanks to Oracle support guys and all my colleagues who were
part of this exercise.
A special thanks goes to the Mr. Deepak Arora (Director, SOA/B2B
Fusion Middleware Architects Team, “The A- Team”), Oracle to provide his
supervision and drive this exercise. We got very useful information from him
which enabled us to make environment stable.
Weblogic And Soa Administrator: Em Console Slowness And Stuck Thread Issue >>>>> Download Now
ReplyDelete>>>>> Download Full
Weblogic And Soa Administrator: Em Console Slowness And Stuck Thread Issue >>>>> Download LINK
>>>>> Download Now
Weblogic And Soa Administrator: Em Console Slowness And Stuck Thread Issue >>>>> Download Full
>>>>> Download LINK rp