Feed aggregator

I See Systems

Floyd Teter - Mon, 2019-12-23 14:46
Working in enterprise software has changed my perspective on the world around me... I see everything through more of a systemic lens.

My wife recently had knee replacement surgery.  World-renown surgeon performing the surgery.  But being done at his 2nd choice of hospitals due to limitations of our health insurance policy.  Good hospital, but not his hospital of choice.  Like everything else involving health care in the U.S., it's a money thing.

The surgery was originally scheduled for Friday, December 13th but postponed until Monday, December 16th.  On the 16th, we were requested to arrive at the hospital at 11 am for a planned surgery of 1 pm.  But the surgery did not begin until 6 pm.  At this point, I'm thinking that these guys have no idea about linear programming and managing their supply chain.

While my wife was in surgery, I was directed to a "Surgery Waiting Room".  The room allegedly had an attendant and a television screen for monitoring patient status during surgery.  When I entered the waiting room, I immediately noticed the attendant's desk was unoccupied.  Seems that the attendant's shift ends at 5 pm - those waiting on surgeries beyond 5 pm are asked to answer the phones on a volunteer basis.  The idea being that surgical staff will call the waiting room to update family and friends when surgery is completed, and to alert those waiting that the surgeon is on the way up with a more detailed debrief.  So, being the volunteering type, I sit at the attendant's desk and start answering the phone.  After about three incoming calls, I notice a trend - no friends or family in the room to receive the updates.  And I'm also noticing surgeons coming into the room looking for friends and family who are not there.  I put the phone to use and make a few inquiries within the hospital - and discover that there are three surgery waiting rooms in this tower alone (the hospital in question has seven towers).  Neither the surgical staff nor the surgeons have any idea which waiting room contains friends and family for any particular patient, as they have no waiting room check-in function.  So they're literally hunting for the pea under any one of three shells as they go from waiting room to waiting room searching for friends and family of a particular patient.  Low-key chaos.

Another observation from the surgical waiting room:  the status screen showing patient status is listed by case number rather than patient name.  I did not have a case number for my wife's surgery.  And a quick survey of the folks in the waiting room indicated that none of them had a case number either.  So the status screen was useless to those of us waiting on surgical outcomes.  During the next few incoming calls from surgical staff, I asked for case numbers.  But the surgical staff had no idea of the case numbers either.  What in the world?

Fortunately, the surgery went well, the outcome was positive, and Marlene is now home working through the long recovery process.

After rolling the surgical experience over in my head, I decided to view this experience as a systemic failure for this particular hospital.  Lots of good people with lots of energy all trying to do the right thing, but nobody has tied it all together.  So how might I suggest improving things from a system point of view?

First, I'd establish personas:  the patient, the family member/friend of the patient, the surgical staff member, the attendant, the pre-op nurse, the surgeon, and so on.  Then I'd walk through the entire process, from the time a patient walks in the door until they leave post-op recovery.  And I'd do it for each persona.  Essentially build a systemic "As Is" flow.  Then design how we want things to work, review the changes with the people doing the work, then implement.

Sounds a bit like an enterprise applications implementation project, doesn't it?  Well, enterprise applications are all about implementing systems.  Like I said, working in enterprise software has changed my perspective of the world around me...

How about you?  Ever have a similar experience?  Healthcare?  The DMV? Retail returns?  Sound off in the comments about your experience and how you'd approach a fix.

I See Systems

Floyd Teter - Mon, 2019-12-23 14:46
Working in enterprise software has changed my perspective on the world around me... I see everything through more of a systemic lens.

My wife recently had knee replacement surgery.  World-renown surgeon performing the surgery.  But being done at his 2nd choice of hospitals due to limitations of our health insurance policy.  Good hospital, but not his hospital of choice.  Like everything else involving health care in the U.S., it's a money thing.

The surgery was originally scheduled for Friday, December 13th but postponed until Monday, December 16th.  On the 16th, we were requested to arrive at the hospital at 11 am for a planned surgery of 1 pm.  But the surgery did not begin until 6 pm.  At this point, I'm thinking that these guys have no idea about linear programming and managing their supply chain.

While my wife was in surgery, I was directed to a "Surgery Waiting Room".  The room allegedly had an attendant and a television screen for monitoring patient status during surgery.  When I entered the waiting room, I immediately noticed the attendant's desk was unoccupied.  Seems that the attendant's shift ends at 5 pm - those waiting on surgeries beyond 5 pm are asked to answer the phones on a volunteer basis.  The idea being that surgical staff will call the waiting room to update family and friends when surgery is completed, and to alert those waiting that the surgeon is on the way up with a more detailed debrief.  So, being the volunteering type, I sit at the attendant's desk and start answering the phone.  After about three incoming calls, I notice a trend - no friends or family in the room to receive the updates.  And I'm also noticing surgeons coming into the room looking for friends and family who are not there.  I put the phone to use and make a few inquiries within the hospital - and discover that there are three surgery waiting rooms in this tower alone (the hospital in question has seven towers).  Neither the surgical staff nor the surgeons have any idea which waiting room contains friends and family for any particular patient, as they have no waiting room check-in function.  So they're literally hunting for the pea under any one of three shells as they go from waiting room to waiting room searching for friends and family of a particular patient.  Low-key chaos.

Another observation from the surgical waiting room:  the status screen showing patient status is listed by case number rather than patient name.  I did not have a case number for my wife's surgery.  And a quick survey of the folks in the waiting room indicated that none of them had a case number either.  So the status screen was useless to those of us waiting on surgical outcomes.  During the next few incoming calls from surgical staff, I asked for case numbers.  But the surgical staff had no idea of the case numbers either.  What in the world?

Fortunately, the surgery went well, the outcome was positive, and Marlene is now home working through the long recovery process.

After rolling the surgical experience over in my head, I decided to view this experience as a systemic failure for this particular hospital.  Lots of good people with lots of energy all trying to do the right thing, but nobody has tied it all together.  So how might I suggest improving things from a system point of view?

First, I'd establish personas:  the patient, the family member/friend of the patient, the surgical staff member, the attendant, the pre-op nurse, the surgeon, and so on.  Then I'd walk through the entire process, from the time a patient walks in the door until they leave post-op recovery.  And I'd do it for each persona.  Essentially build a systemic "As Is" flow.  Then design how we want things to work, review the changes with the people doing the work, then implement.

Sounds a bit like an enterprise applications implementation project, doesn't it?  Well, enterprise applications are all about implementing systems.  Like I said, working in enterprise software has changed my perspective of the world around me...

How about you?  Ever have a similar experience?  Healthcare?  The DMV? Retail returns?  Sound off in the comments about your experience and how you'd approach a fix.

To compare two same tables from different schema without primary key and not same number of columns

Tom Kyte - Fri, 2019-12-20 08:55
We have table 'CUSTOMER' in two different schema's. Both are not having any primary key and the column numbers in both table do not match(i.e schema1 table can have 97 column other schema table has 101).the column names are same which are present in ...
Categories: DBA Blogs

PL/SQL code in packages on DB vs on APEX sites - how it affect performance?

Tom Kyte - Fri, 2019-12-20 08:55
Dear Tom, As you develop in APEX you can move all your PL/SQL code into packages or you can put all on APEX. How this affect performance? I know that it is better to move code to packages to make quick changes and have more control over code. ...
Categories: DBA Blogs

PLSQL

Tom Kyte - Fri, 2019-12-20 08:55
Requ:- If table do not have records then need to be inserted if have already the END DATE column only update with an no.of years based on Terms( For example If Term year is 10, then 10*12=120 Months, means..10 years needs to added to the END DATE col...
Categories: DBA Blogs

Merge Delete

Tom Kyte - Fri, 2019-12-20 08:55
How do I delete with a merge statement? I want to update MERGE_DELETE_TEST2 to match MERGE_DELETE_TEST1 (think ETL). I cannot get merge delete to remove the row that exists in MERGE_DELETE_TEST2 that does not exist in MERGE_DELETE_TEST1.
Categories: DBA Blogs

Import Production Dump to new schema

Tom Kyte - Fri, 2019-12-20 08:55
Hi, We have a live project which has only 1 schema. We are upgrading some features and so we are going to release it as version 2.0 . My question is that the production dump which is only one schema has to be to imported to 4 different schema o...
Categories: DBA Blogs

Check Constraints and Explain Plan Filter Predicates

Tom Kyte - Fri, 2019-12-20 08:55
Why does the Oracle SQL Optimizer include a filter predicate for a Check Constraint when generating an execution plan for a SELECT statement? If the constraint is valid (according to DBA_CONSTRAINTS), then the table rows are all compliant with the co...
Categories: DBA Blogs

Dynamic filters and arriving bind variables for them.

Tom Kyte - Fri, 2019-12-20 08:55
Team, we have an application, that used to search using any kind of filters on any colums - something like below. the procedure is used to return the resultset to the application, based on the WHERE clause being passed as input. when running...
Categories: DBA Blogs

Error while relocating database service

Tom Kyte - Fri, 2019-12-20 08:55
Hello, Ask Tom Team. <b>My environment: </b> I have a database running on 2-node RAC. I created a database service with TAF and transaction guard srvctl add service -db dbprod -service dbprod1_xa -preferred dbprod1 -available dbprod2 -fail...
Categories: DBA Blogs

Purge Cursor

Jonathan Lewis - Fri, 2019-12-20 07:54

This is a note I first drafted about 5 years ago (the date stamp says March 2014) and rediscovered a few days ago when the question came up on a Twitter thread.

How do you purge a single SQL statement from the library cache without having to execute “alter system flush shared_pool”?

The answer is in the package dbms_shared_pool, specfically the purge() procedure. This package changes significantly in the upgrade from 11.2 (manual page here) to 12.1 (manual page here) so it’s best to check the reference manual for the version you’re using in case it changes again.  In 11.2 (and earlier) there’s just one option for the purge() procedure but in 12.1 the package gets 3 overloaded versions of the procedure – and one of the operloads gets an extra parameter (edition) by 19c.

Side note: In very early versions of Oracle the package wasn’t installed automatically, so you may have to execute $ORACLE_HOME/admin/rdbms/dbmspool.sql (possibly followed by prvtpool.plb) to install it. The facility to purge a cursor appeared in 11.1 and was then back-ported to 10.2.0.4. The manual pages for the procedure are, however, not up to date and don’t list all the possible flags that tell the procedure what type of object it is supposed to be purging.

The only use I’ve made of the purge() procedure is to purge a cursor from memory, though you can purge other types of object if you want to. Technically you could flush the execution plan from memory without eliminating the cursor, though you would still have to re-optimize the statement so there may be no benefit (or very little benefit) in doing so.

To demonstrate the mechanism I’m going to use three sessions – two to run a query with different optimizer environments, then a third to find and purge the cursors. Here’s the code for the first two sessions:


rem
rem     Script: purge_cursor.sql
rem     Dated:  March 2014
rem     Author: Jonathan Lewis
rem
rem     Last tested
rem             12.2.0.1
rem             11.2.0.4
rem             10.2.0.4  -- with variations to get  two child cursors
rem

/*    To be run by session 1    */

create table t1 as select * from all_objects where rownum <= 10000;
create index t1_i1 on t1(object_id) invisible;

alter session set optimizer_use_invisible_indexes = true;

set serveroutput off
select object_name from t1 where object_id = 250;
select * from table(dbms_xplan.display_cursor);

/*    To be run by session 2    */

set serveroutput off
select object_name from t1 where object_id = 250;
select * from table(dbms_xplan.display_cursor);

You’ll see that I’ve created an invisible index on the table then allowed one session to use invisible indexes while the other session isn’t allowed to. As a consequence session 1 will produce an execution plan that shows an index range scan for child cursor 0, and session 2 will produce an execution plan that shows a table scan for child cursor 1. I won’t show the output from these two sessions, but my calls to dbms_xplan reported the sql_id as ‘ab08hg3s62rpq’ and I’ve used that as the value for a substituion variable in the code to be run by session 3.

The final demo srcipt is a little messy because it’s going to attempt to report all the execution plans for that sql_id three times, but it will also write and execute a script to call the purge() procedure twice – once to eliminate the plans but leave the cursors in place, then a second time to purge the cursors.

define m_sql_id = 'ab08hg3s62rpq'

spool purge_cursor

prompt  =============================
prompt  Before purge- 2 child cursors
prompt  =============================

select * from table(dbms_xplan.display_cursor('&m_sql_id', null));

spool off

set verify off
set feedback off
set heading off

spool temp_purge.sql

select 
        q'{ execute dbms_shared_pool.purge('}' ||
                address || ',' || hash_value || 
                q'{', 'C', 64) }'
from 
        V$sqlarea 
where 
        sql_id = '&m_sql_id'
;

spool off

@temp_purge

set heading on
set feedback on
set verify on

spool purge_cursor append

prompt  =======================================
prompt  After heap 6 purge- 2 cursors, no plans
prompt  =======================================

select * from table(dbms_xplan.display_cursor('&m_sql_id', null));

spool off

set verify off
set feedback off
set heading off

spool temp_purge.sql

select 
        q'{ execute dbms_shared_pool.purge('}' ||
                address || ',' || hash_value || 
                q'{', 'C') }'
from 
        V$sqlarea 
where 
        sql_id = '&m_sql_id'
;

spool off

@temp_purge

set heading on
set verify on
set feedback on

spool purge_cursor append

prompt  =================================
prompt  After complete purge - no cursors
prompt  =================================

select * from table(dbms_xplan.display_cursor('&m_sql_id', null));

On the first pass the select from v$sqlarea produces a script with the following line:

 execute dbms_shared_pool.purge('000000008D3B4FD0,4032913078', 'C', 64)

On the second pass the select produces a script with the following line:

 execute dbms_shared_pool.purge('000000008D3B4FD0,4032913078', 'C')

The effect of the first call is to purge heap 6 (power(2,6) = 64) for any child cursors that exist for the sql_id. The effect of the second call is to purge the entire set of child cursors. The purge_cursor.lst file ends up with the following results:


=============================
Before purge- 2 child cursors
=============================

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
SQL_ID  ab08hg3s62rpq, child number 0
-------------------------------------
select object_name from t1 where object_id = 250

Plan hash value: 3320414027

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    25 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=250)

SQL_ID  ab08hg3s62rpq, child number 1
-------------------------------------
select object_name from t1 where object_id = 250

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    27 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    25 |    27   (8)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_ID"=250)


37 rows selected.

=======================================
After heap 6 purge- 2 cursors, no plans
=======================================
old   1: select * from table(dbms_xplan.display_cursor('&m_sql_id', null))
new   1: select * from table(dbms_xplan.display_cursor('ab08hg3s62rpq', null))

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
SQL_ID  ab08hg3s62rpq, child number 0

select object_name from t1 where object_id = 250

NOTE: cannot fetch plan for SQL_ID: ab08hg3s62rpq, CHILD_NUMBER: 0
      Please verify value of SQL_ID and CHILD_NUMBER;
      It could also be that the plan is no longer in cursor cache (check v$sql_plan)

SQL_ID  ab08hg3s62rpq, child number 1

select object_name from t1 where object_id = 250

NOTE: cannot fetch plan for SQL_ID: ab08hg3s62rpq, CHILD_NUMBER: 1
      Please verify value of SQL_ID and CHILD_NUMBER;
      It could also be that the plan is no longer in cursor cache (check v$sql_plan)


16 rows selected.

=================================
After complete purge - no cursors
=================================
old   1: select * from table(dbms_xplan.display_cursor('&m_sql_id', null))
new   1: select * from table(dbms_xplan.display_cursor('ab08hg3s62rpq', null))

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
SQL_ID: ab08hg3s62rpq cannot be found


2 rows selected.

As you can see we start with two child cursors and two different execution plans, then the two child cursors “lose” their execution plans, and finally the cursors disappear completely. (And the parent disappears at the same time.)

Presenting the procedure with a little more formality – the formal declaration of the procedure that I’ve been using reads:


procedure purge(
        name    varchar2, 
        flag    char    DEFAULT 'P', 
        heaps   number  DEFAULT 1
)

To purge a cursor we set the flag to ‘C’ (or ‘c’)  and the name to ‘{address},{hash_value}’ (make sure you don’t get extra spaces in that expression). If we want to purge just the execution plan we need to target heap 6 which means setting the heaps value to power(2,6). Different types of object use difference (sub)heaps so if you want to delete multiple heaps you need to add together the appropriate power(2,N); the default value for heaps is 1, which equates to heap 0, which means the whole object.

For other types of object there is a reference list (under the keep() procedure) that expands on the manuals to give us the following possible values of flag:


  --        Value        Kind of Object to {keep}
  --        -----        ------------------------
  --          P          package/procedure/function
  --          Q          sequence
  --          R          trigger
  --          T          type
  --          JS         java source
  --          JC         java class
  --          JR         java resource
  --          JD         java shared data
  --          C          cursor

You will note, of course, that I’ve used v$sqlarea rather than v$sql when I was searching for the address and hash value. When you purge a cursor you purge every child cursor you can’t identify an individual child. Even if you query v$sql instead of v$sqlarea, and use a single child_address instead of the (parent) address the purge() procedure will still purge every child for the parent.

Warning

There is a comment in my original notes about a bug that was fixed by 11.2.0.4 – if any of the child cursors is currently executing then the purge() procedure will go into a loop waiting on “cursor: pin X”, timing out every 1/100 second. Unfortunately my notes didn’t make any explicit comment about what impact this had on any session trying to parse or execute any of the children for that parent cursor – but I suspect that you’d end up with a nasty race condition and an apparent hang.

OT Footnote

I’ve decided this year to donate to a charity that works to reduce infant mortality rates in Nepal with a two-pronged attack on malnutrition: feeding starving children, then educating their parents on how to make best use local resources to grow the most appropriate crops and use the best preparation methods to  produce nourishing meals in the future. (They also run other projects to improve the lives of the young people in Nepal – here’s a link to their home page, and a direct link to a 4 minute video that gives you a quick insight into what they do and how they do it.)

If you’re thinking of making any small donations to charity over the next few weeks, please think of this one. To make your donation more valuable I’ve set up a justgiving page and will match any donations made there, up to a total of £1,000.

ASC 842 :Audit Procedures for Leases

OracleApps Epicenter - Fri, 2019-12-20 03:59
The new Financial Accounting Standards Board (FASB) and international financial reporting standards (IFRS) lease accounting standards (ASC 842 and IFRS 16) will take effect in 2020 for private companies. The standards bring many leases onto the balance sheet and could significantly impact a business’ financial statements. Here is great post on Audit Procedures for Leases […]
Categories: APPS Blogs

Microsoft and Oracle Expand Interoperability Partnership to Canada

Oracle Press Releases - Thu, 2019-12-19 07:00
Blog
Microsoft and Oracle Expand Interoperability Partnership to Canada

By Vinay Kumar, vice president, product management, Oracle Cloud Infrastructure—Dec 19, 2019

Today, we are announcing the continued expansion of our cloud interoperability partnership with Microsoft to help joint customers worldwide run their mission-critical workloads across Oracle Cloud and Microsoft Azure. Our new interconnect location means enterprises can now build workloads that seamlessly interoperate between Microsoft and Oracle cloud regions in Canada. This interconnect builds on an existing partnership announced in June of 2019.

The partnership has received a huge amount of interest, as 80 percent of enterprises use a combination of Microsoft and Oracle software to run their businesses. As cloud computing becomes ubiquitous, and businesses rely on multiple cloud providers, the partnership makes managing companies’ most important cloud workloads significantly easier. These workloads include financial planning, inventory, sales applications – and their underlying databases.

The expansion will give more customers direct, fast and highly reliable network connectivity between Microsoft Azure and Oracle Cloud, while providing first-class customer service and support that enterprises have come to expect from the two companies. This unique multi-cloud solution delivers the performance, easy integration, rigorous service level agreements, and collaborative enterprise support that they need to simplify their operations. 

Simply put, the Oracle-Microsoft partnership means cloud services run by the two providers will interoperate as if they were part of a single cloud, making it easier for customers to run their mission-critical workloads across the two clouds.

“The global demand for running applications and databases in multi-cloud environments continues to accelerate,” said Clay Magouyrk, senior vice president of engineering, Oracle Cloud Infrastructure.  “With the new interconnect, our Canadian customers can now take advantage of a nearly seamless cloud integration between the world's largest enterprise cloud providers, Microsoft and Oracle.”

More Regions to Reach More Customers

The two companies are putting customers first by enabling them to run full-stack applications side-by-side across clouds, or one part of a workload within Azure and another part of the same workload within Oracle Cloud. For example, using the interconnect makes it possible to connect Azure services like analytics and AI to Oracle Cloud services like Autonomous Database. Together, Azure and Oracle Cloud offer customers a one-stop shop for all the cloud services and applications they need to run their entire business.

From a technical perspective, the interconnect means less latency or delay, which enables better data transfer and application interaction between clouds. It also supports a broader spectrum of workloads, using resources available on both sides. Accenture recently performed testing on the performance of the interconnect and confirmed that the solution offers customers low latency and high ease of use.

Microsoft and Oracle plan to make the direct interconnect available in additional regions, including on the US West Coast, in a US Government specific region, in Asia, and in the European Union.

Earlier this year, Oracle and Microsoft created an interconnect in Ashburn (North America), Azure US East, and in London (United Kingdom).

London March 2020: “Oracle Indexing Internals and Best Practices” and “Oracle Performance Diagnostics and Tuning” Seminars

Richard Foote - Thu, 2019-12-19 02:06
Places are filling up, but there are still some available at both of my acclaimed seminars that I’ll be running in London, UK in March 2020. The dates and registration links are as follows: 23-24 March 2020: “Oracle Indexing Internals and Best Practices” seminar – Tickets and Registration Link 25-26 March 2020: “Oracle Performance Diagnostics and […]
Categories: DBA Blogs

A Ruthless Repository Shutdown Utility, Part II

Yann Neuhaus - Wed, 2019-12-18 15:51
Stopping the unreachable repositories

Suppose that the docbroker has been stopped prematurely and that we want to shut down the repositories but the out-of-the-box dm_shutdown_repository is not effective. Why is it so by the way ? If we look closely inside the shutdown script, we quickly notice the reason:

#!/bin/sh
################## DOCUMENTUM SERVER SHUTDOWN FILE ######################
#
# 1994-2018 OpenText Corporation. All rights reserved
# Version 16.4 of the Documentum Server.
#
# A generated server shutdown script for a repository.
# This file was generated on Fri Aug 30 12:15:10 CEST 2019 by user dmadmin.
#
check_connect_status() {
status=$1
if [ ! $status = 0 ] ; then
  cat <<-END
  ***** $0: ERROR
  ***** Unable to complete shutdown - unable to connect
  ***** to the server to issue $2 request.
END
  exit 1
fi
}
...
# Stop the server
echo Stopping Documentum server for repository: [dmtestgr02]
echo ''
DM_DMADMIN_USER=dmadmin
#
# Get the pid for the root process
#
DM_PID=`./iapi dmtestgr02 -U$DM_DMADMIN_USER -P -e << EOF  | grep 'root_pid' | sed -e 's/ .*[: A-Za-z]//'
apply,s0,NULL,LIST_SESSIONS
next,s0,q0
dump,s0,q0
exit
EOF`
status=$?
check_connect_status $status LIST_SESSIONS
...
            kill -9 $child_pid
...
  kill -9 $DM_PID
...
         kill -9 $child_pid
...

On line 29, the shutdown script first attempts to connect to the repository in order to retrieve the root pid of the server processes. On line 36, this attempt’s result is checked by the function check_connect_status defined earlier in the script at line 10. If something went wrong during the connection, iapi’s return status will be != 0 and check_connect_status will simply exit the script on line 18. So, if a repository has gone berserk, or no free sessions are available, or the docbroker is unreachable, the script will not be able to stop it. That logic is quite restrictive and we must fall back to killing the repository’s processes ourselves anyway.
Strangely enough, the script is not scared of killing processes, it does this from several places, but it rather looks like it is a bit shy in identifying the right ones and therefore relies on the server itself or, ultimately, on the user, for help in this area.
Admittedly, it is not always easy to pinpoint the right processes from the list returned by the command ps, especially if the repository is running in HA on the same machine, or if several repositories share the same machine, so extra care must be used in order not to kill the wrong ones. The dm_shutdown_docbase avoids this difficulty altogether by asking the content server (aka CS) its root pid and that is why it aborts if it cannot contact it.
Historically, the “kill” command could only “kill -9” (SIGKILL, forceful, coercive kill) but nowadays it has been generalized to send signals and could just as well have been forked to “signal” or “send”. So, can a signal be sent to the main executable ${DM_HOME}/bin/documentum to ask it to cleanly shut down the repository ? We wish but this has not been implemented. Signals such as SIGQUIT, SIGTRAP, SIGINT and SIGABRT are trapped indeed but will only kill the server after printing to the server’s log the last executed SQL or the call stack trace, e.g. after a SIGINT was sent:

2019-10-11T13:14:14.045467 24429[24429] 0100c35080004101 Error: dm_bear_trap: Unexpected exception, (SIGINT: interrupt: (2) at (Connection Failure)), during new session creation in module dmapply.cxx after line 542. Process exiting.
Last SQL statement executed by DB was:
 
 
Last SQL statement executed by DB was:
Last SQL statement executed by DB was:
 
 
 
 
Last SQL statement executed by DB was:
 
 
(23962) Outer Exception handler caught exception: SIGINT: interrupt: (2) at (RPC MAIN)

Thus, a corruption is theoretically possible while using any of those signals, just as it is when a SIGKILL signal is issued.
According to OTX Support, a trap handler that shuts down cleanly the repository has not been implemented because it needs a session to invoke the shutdown server method. OK, and what if a hidden session were opened at startup time and kept around just for such administrative cases ? How about a handler to immediately force a projection to the available docbrokers instead of waiting for the next checkpoint cycle ? As you see, there are ways to make the shutdown more resilient but my overall feeling is there is a lack of willingness to improve the content server.
Therefore, if waiting about 5 minutes for the repository to project to a docbroker is not acceptable, there is no other alternative than kill -9 the repository’s processes, start the docbroker(s) and then the repository. Other signals can work, but not always, and are not any safer.
In order to use that command, one needs to know the content server’s root pid and since the CS does not accept any connection at this point, one must get it from another source. Once the root pid is available, it can be given to the kill command with a slight subtlety: in order to include its children processes, the root pid must be negated, e.g.:

# use the standalone /bin/kill command;
$ /bin/kill --signal SIGKILL -12345
# or use bash's kill builtin:
$ command kill -s SIGKILL -12345

This will transitively kill the process with pid 12345 and all the others in same group, which are the ones it started itself, directly or indirectly.
If a numeric signal is preferred, the equivalent command is:

$ /bin/kill -9 -12345

I leave it to you to decide which one is more readable.
So now, we need to identify the repository’s root process. Once found, we can send its negated value the SIGKILL signal, which will propagate to all the child processes. Let’s see now how to identify this root process.

Identifying the content server’s root process

Ordinarily, the LIST_SESSIONS server method returns a collection containing the root_pid attribute among other valuable information, e.g.:

API> apply,c,NULL,LIST_SESSIONS
...
q0
API> next,c,q0
...
OK
API> dump,c,q0
...
USER ATTRIBUTES
 
  root_start                      : 12/11/2019 22:53:19
  root_pid                        : 25329
  shared_mem_id                   : 2588691
  semaphore_id                    : 0
  session                      [0]: 0100c3508000a11c
                               [1]: 0100c3508000a102
                               [2]: 0100c3508000a101
  db_session_id                [0]: 272
                               [1]: 37
                               [2]: 33
  typelockdb_session_id        [0]: -1
                               [1]: -1
                               [2]: -1
  tempdb_session_ids           [0]: -1
                               [1]: 45
                               [2]: 36
  pid                          [0]: 17686
                               [1]: 26512
                               [2]: 26465
  user_name                    [0]: dmadmin
                               [1]: dmadmin
                               [2]: dmadmin
  user_authentication          [0]: Trusted Client
                               [1]: Password
                               [2]: Trusted Client
  client_host                  [0]: docker
                               [1]: 172.19.0.3
                               [2]: docker
  client_lib_ver               [0]: 16.4.0070.0035
                               [1]: 16.4.0070.0035
                               [2]: 16.4.0070.0035
...

But in our case, the CS is not reachable so it cannot be queried.
An easy alternative is to simply look into the CS’s log:

dmadmin@docker:/app/dctm$ less /app/dctm/dba/log/dmtest.log
 
    OpenText Documentum Content Server (version 16.4.0080.0129  Linux64.Oracle)
    Copyright (c) 2018. OpenText Corporation
    All rights reserved.
 
2019-12-11T22:53:19.757264      25329[25329]    0000000000000000        [DM_SERVER_I_START_SERVER]info:  "Docbase dmtest attempting to open"
 
2019-12-11T22:53:19.757358      25329[25329]    0000000000000000        [DM_SERVER_I_START_KEY_STORAGE_MODE]info:  "Docbase dmtest is using database for cryptographic key storage"
...

The number 25329 is the root_pid. It can be extracted from the log file as shown below:

$ grep "\[DM_SERVER_I_START_SERVER\]info" /app/dctm/dba/log/dmtest.log | gawk '{if (match($2, /\[[0-9]+\]/)) {print substr($2, RSTART + 1, RLENGTH - 2); exit}}'
25329
# or compacter:
gawk '{if (match($0, /\[([0-9]+)\].+\[DM_SERVER_I_START_SERVER\]info/, root_pid)) {print root_pid[1]; exit}}' /app/dctm/dba/log/dmtest.log
25329

The extracted root_pid can be confirmed by the ps command with options ajxf showing a nice tree-like view of the running processes. E.g.:

dmadmin@docker:/app/dctm$ ps_gpid 25329
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    1 25329 25329 25329 ?           -1 Ss    1001   0:01 ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
25329 25370 25329 25329 ?           -1 S     1001   0:00  \_ /app/dctm/product/16.4/bin/mthdsvr master 0xe901fc83, 0x7f084db15000, 0x223000 50000  5 25329 dmtest /app/dctm/dba/log
25370 25371 25329 25329 ?           -1 Sl    1001   0:05  |   \_ /app/dctm/product/16.4/bin/mthdsvr worker 0xe901fc83, 0x7f084db15000, 0x223000 50000  5 0 dmtest /app/dctm/dba/log
25370 25430 25329 25329 ?           -1 Sl    1001   0:05  |   \_ /app/dctm/product/16.4/bin/mthdsvr worker 0xe901fc83, 0x7f084db15000, 0x223000 50000  5 1 dmtest /app/dctm/dba/log
25370 25451 25329 25329 ?           -1 Sl    1001   0:05  |   \_ /app/dctm/product/16.4/bin/mthdsvr worker 0xe901fc83, 0x7f084db15000, 0x223000 50000  5 2 dmtest /app/dctm/dba/log
25370 25464 25329 25329 ?           -1 Sl    1001   0:05  |   \_ /app/dctm/product/16.4/bin/mthdsvr worker 0xe901fc83, 0x7f084db15000, 0x223000 50000  5 3 dmtest /app/dctm/dba/log
25370 25482 25329 25329 ?           -1 Sl    1001   0:05  |   \_ /app/dctm/product/16.4/bin/mthdsvr worker 0xe901fc83, 0x7f084db15000, 0x223000 50000  5 4 dmtest /app/dctm/dba/log
25329 25431 25329 25329 ?           -1 S     1001   0:00  \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
25329 25432 25329 25329 ?           -1 S     1001   0:00  \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
25329 25453 25329 25329 ?           -1 S     1001   0:00  \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
25329 25465 25329 25329 ?           -1 S     1001   0:00  \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
25329 25489 25329 25329 ?           -1 S     1001   0:00  \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
25329 26439 25329 25329 ?           -1 Sl    1001   0:11  \_ ./dm_agent_exec -docbase_name dmtest.dmtest -docbase_owner dmadmin -sleep_duration 0
25329 26465 25329 25329 ?           -1 S     1001   0:00  \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
    1 10112 25329 25329 ?           -1 Rl    1001   0:03 ./dm_agent_exec -docbase_name dmtest.dmtest -docbase_owner dmadmin -trace_level 0 -job_id 0800c3508000218b -log_directory /app/dctm/dba/log -docbase_id 50000

On line 3, the CS for docbase dmtest was started with pid 25329 and same value for its pgid. This process started then a few child processes all with the pgid 25329.
ps_pgid on line 1 is a bash function defined in ~/.bashrc as follows:

# returns the lines from ps -ajxf with given gpid;
# the ps command's header line is printed only if at least 1 entry is found;
function ps_pgid {
   pgid=$1
   ps -ajxf | gawk -v pgid=$pgid 'BEGIN {getline; header = $0; h_not_printed = 1} {if ($3 == pgid) {if (h_not_printed) {print header; h_not_printed = 0}; print}}'
}

The command does not show the method server nor the docbroker as they were started separately from the CS.
Thus, if we execute the command below:

$ kill --signal SIGKILL -25329

the CS will be killed along with all its child processes, which is exactly what we want.

Putting both commands together, we get:

kill --signal SIGKILL -$(grep "\[DM_SERVER_I_START_SERVER\]info" /app/dctm/dba/log/dmtest.log | gawk '{if (match($2, /\[[0-9]+\]/)) {print substr($2, RSTART + 1, RLENGTH - 2); exit}}')

It may be worth defining a bash function for it too:

function kill_cs {
   repo=$1
   kill --signal SIGKILL -$(grep "\[DM_SERVER_I_START_SERVER\]info" /app/dctm/dba/log/${repo}.log | gawk '{if (match($2, /\[[0-9]+\]/)) {print substr($2, RSTART + 1, RLENGTH - 2); exit}}')
}
 
# source it:
. ~/.bashrc
 
# call it:
kill_cs dmtest

where test is the content server to kill.
The naive way to search the running content server via the command “ps -ef | grep docbase_name” can be too ambiguous in case of multiple content servers for the same repository (e.g. in a high-availability installation) or when docbase_name is the stem of a family of docbases (e.g. dmtest_1, dmtest_2, …, dmtest_10, etc…). Besides, even if no ambiguity were possible, it would return too many processes to be killed individually. xargs could do it at once, sure, but why risk killing the wrong ones ? The above ps_pgid function is directly looking for the given group id which is the root_pid of the content server of interest taken straight out of its log file, no ambiguity here.

Hardening start-stop.sh

This ruthless kill functionality could be added to the start-stop script listed above, either as a command-line option to the stop parameter (say, like -k as in the dm_shutdown_repository script) or as a full parameter on a par with the stop | start | status ones, i.e.:

start-stop.sh stop | start | status | kill ...

or, simply by deciding that a stop should always succeed and forcing a kill if needed. In such variant, the stop_docbase() function becomes:

stop_docbase() {
   echo "stopping $docbase"
   docbase=$1
   ./dm_shutdown_${docbase}
   if [[ $? -eq 1 ]]; then
      echo "killing docbase $docbase"
      kill_cs $docbase
   fi
   echo "docbase $docbase stopped"
}
Conclusion

If the content server were open source we wouldn’t have this article’s title. Instead, it would be “Forcing impromptu projections to docbrokers through signal handling in content server: an implementation” or “Shutting down a content server by sending a signal: a proposal”. We could send this request to the maintainers and probably receive a positive answer. Or we could implement the changes ourselves and submit them as a RFC. This model does not work so much in closed, commercial source which evolves following its own marketing agenda. Nonetheless, this situation gives us the opportunity to rant about it and find work-arounds. Imagine a world where all software were flawless, would it be as fun ?

Cet article A Ruthless Repository Shutdown Utility, Part II est apparu en premier sur Blog dbi services.

A Ruthless Repository Shutdown Utility, Part I

Yann Neuhaus - Wed, 2019-12-18 13:16

You have finally completed that migration and need to restart all the Documentum processes. So, you shut down the docbroker and move on to the repositories but then you receive an error message about them not being reachable any more. Or conversely, you want to start all the Documentum processes and you start first the repositories and later the docbrokers. Next, you want to connect to one repository and you receive the same error message. Of course, you finally remember, since the docbroker is a requirement for the repositories, it must be started first and shut down last but it is too late now. How to get out if this annoying situation ? You could just (re)start the docbroker and wait for the next repostories’ checkpoint, at most 5 minutes by default. If this is not acceptable, at first sight, there is no other solution than to “kill -9” the repositories’ processes, start the docbroker and only next the repositories. Let’s see if we can find a better way. Spoiler alert: to stop this insufferable suspens, I must say up front that there is no other way, sorry, but there are a few ways to alleviate this inconvenience.

A quick clarification

Let’s first clarify a point of terminology here: there is a difference between docbases/repositories and content servers. A docbase encompasses the actual content and their persistent data and technical information whereas the content server is the set of running processes that give access to and manage one docbase. It is very similar to Oracle’s databases and instances, where one database can be served by several instances, providing parallelism and high availability. A docbase can be served by more than one content server, generally spread over different machines, with its own set of dm_start_docbase and dm_shutdown_docbase scripts and server.ini. A docbase knows how many content servers use it because they each have their own dm_server_config object. If there is just one content server, both docbase and content server can be used interchangeably but when there are several content servers for the same docbase, when one says “stopping the docbase” it really means “stopping one particular content server”, and this is the meaning that will be used in the rest of the article. If the docbase has more than one content servers, just extend the presented manipulations to each of them.

Connecting to the repositories without a docbroker

If one could connect to a repository without a running docbroker, the situation that triggered this article, things would be much easier. In the ancient, simpler times, the dmcl.ini parameters below could help working around an unavailable docbroker:

[DOCBROKER_DEBUG]
docbase_id = <id of docbase as specified in its server.ini file>
host =  <host's name the docbase server is running on>
port = <docbase's port as specified in /etc/services>
service = <docbase's service name as specified in /etc/services>

and they used to work.
After the switch to the dfc.properties file, those parameters were renamed as follows:

dfc.docbroker.debug.docbase_id=<id of docbase as specified in its server.ini file>
dfc.docbroker.debug.host=<host's name the docbase server is running on>
dfc.docbroker.debug.port=<docbase's port as specified in /etc/services>
dfc.docbroker.debug.service=<docbase's service name as specified in /etc/services>

Unfortunately, they don’t work any more. Actually, although they are still documented in the dfcfull.properties, they have not been implemented and will never be according to OTX. Moreover, they will be removed in the future. Too bad, that would have been such a cheap way to extricate oneself from an uncomfortable situation.

Preventing the situation

The best solution is obviously to prevent it to happen. This can be easily realized by using a central script for stopping and starting the Documentum stack. And, while we are at it, inquiring its status.
Documentum already provides such a script, e.g. see here Linux scripts for automatic startup and shutdown of Documentum Content Server. Here is another more sophisticated implementation:

#!/bin/bash
#
# See Usage() function below for explanations; 
# cec - dbi-services - April 2019
#

general_status=0

Usage() {
   cat <<EoU
Usage:
    start-stop.sh [(help) | start | stop | status] [(all) | docbases | docbrokers | docbase={,} | docbroker={,} | method_server]
 E.g.:
    display this help screen:
       start-stop.sh
    start all:
       start-stop.sh start [all]
    stop all:
       start-stop.sh stop [all]
    status all:
       start-stop.sh status [all]
    start docbroker01:
       start-stop.sh start docbroker=docbroker01
    start docbases global_registry and dmtest01:
       start-stop.sh docbase=global_registry,dmtest01
    start all the docbases:
       start-stop.sh docbases
    start all the docbrokers:
       start-stop.sh docbrokers
EoU
}

start_docbroker() {
   docbroker=$1
   echo "starting up docbroker $docbroker ..."
   ./dm_launch_${docbroker}
}

start_all_docbrokers() {
   echo "starting the docbrokers ..."
   DOCBROKERS=`ls -1 dm_launch_* 2>/dev/null | cut -f3 -d_`
   nb_items=0
   for docbroker in $DOCBROKERS; do
      start_docbroker $docbroker
      (( nb_items++ ))
   done
   echo "$nb_items docbrokers started"

}

start_docbase() {
   docbase=$1
   echo "starting $docbase"
   ./dm_start_${docbase}
}

start_all_docbases() {
   echo "starting the repositories ..."
   DOCBASES=`ls -1 config 2>/dev/null `
   nb_items=0
   for docbase in $DOCBASES; do
      start_docbase $docbase
      (( nb_items++ ))
   done
   echo "$nb_items repositories started"
}

start_method_server() {
   echo "starting the method server ..."
   cd ${DOCUMENTUM}/${JBOSS}/server
   nohup ${DOCUMENTUM}/${JBOSS}/server/startMethodServer.sh 2>&1 > /tmp/nohup.log &
   echo "method server started"
}

start_all() {
   echo "starting all the documentum processes ..."
   start_all_docbrokers
   start_all_docbases
   start_method_server
}

status_docbroker() {
   docbroker_name=$1
   docbroker_host=$(grep "^host=" /app/dctm/dba/dm_launch_${docbroker_name} | cut -d= -f2)
   docbroker_port=$(grep "dmdocbroker -port " /app/dctm/dba/dm_launch_${docbroker_name} | cut -d\  -f3)
   dmqdocbroker -t $docbroker_host -p $docbroker_port -c ping 2> /dev/null 1> /dev/null
   local_status=$?
   if [ $local_status -eq 0 ]; then
      echo "$(date +"%Y/%m/%d %H:%M:%S"): successfully pinged docbroker $docbroker_name listening on port $docbroker_port on host $docbroker_host"
   else
      echo "$(date +"%Y/%m/%d %H:%M:%S"): docbroker $docbroker_name listening on port $docbroker_port on host $docbroker_host is unhealthy"
      general_status=1
   fi
   echo "status for docbroker $docbroker_name:$docbroker_port: $local_status, i.e. $(if [[ $local_status -eq 0 ]]; then echo OK; else echo NOK;fi)"
}

status_all_docbrokers() {
   DOCBROKERS=`ls -1 dm_launch_* 2>/dev/null | cut -f3 -d_`
   DOCBROKERS_PORTS=`grep -h "./dmdocbroker" dm_launch_* | cut -f3 -d\ `
   for f in `ls -1 dm_launch_* 2>/dev/null `; do
      docbroker_name=`echo $f | cut -f3 -d_`
      docbroker_port=`grep "./dmdocbroker" $f | cut -f3 -d\ `
      status_docbroker $docbroker_name $docbroker_port
   done
   echo "general status for all docbrokers: $general_status, i.e. $(if [[ $general_status -eq 0 ]]; then echo OK; else echo NOK;fi)"
}

status_docbase() {
   docbase=$1
   timeout --preserve-status 30s idql $docbase -Udmadmin -Pxx 2> /dev/null 1> /dev/null <<eoq
     quit
eoq
   local_status=$?
   if [[ $local_status -eq 0 ]]; then
      echo "$(date +"%Y/%m/%d %H:%M:%S"): successful connection to repository $docbase"
   else
      echo "$(date +"%Y/%m/%d %H:%M:%S"): repository $docbase is unhealthy"
      general_status=1
   fi
   echo "status for docbase $docbase: $local_status, i.e. $(if [[ $local_status -eq 0 ]]; then echo OK; else echo NOK;fi)"
}

status_all_docbases() {
   DOCBASES=`ls -1 config 2>/dev/null `
   for docbase in $DOCBASES; do
      status_docbase $docbase
   done
   echo "general status for all docbases: $general_status, i.e. $(if [[ $general_status -eq 0 ]]; then echo OK; else echo NOK;fi)"
}

status_method_server() {
   # check the method server;
   curl --silent --fail -k http://${HOSTNAME}:9080/DmMethods/servlet/DoMethod 2>&1 > /dev/null
   local_status=$?
   if [ $local_status -eq 0 ]; then
      echo "$(date +"%Y/%m/%d %H:%M:%S"): method server successfully contacted"
   else
      echo "$(date +"%Y/%m/%d %H:%M:%S"): method server is unhealthy"
      general_status=1
   fi
   echo "status for method_server: $local_status, i.e. $(if [[ $local_status -eq 0 ]]; then echo OK; else echo NOK;fi)"
}

status_all() {
   status_all_docbrokers
   status_all_docbases
   status_method_server
   echo "General status: $general_status, i.e. $(if [[ $general_status -eq 0 ]]; then echo OK; else echo NOK;fi)"
}

stop_docbase() {
   echo "stopping $docbase"
   docbase=$1
   ./dm_shutdown_${docbase}
   echo "docbase $docbase stopped"
}

stop_all_docbases() {
   echo "stopping the repositories ..."
   DOCBASES=`ls -1 config 2>/dev/null `
   nb_items=0
   for docbase in $DOCBASES; do
      stop_docbase $docbase
      (( nb_items++ ))
   done
   echo "$nb_items repositories stopped"
}

stop_docbroker() {
   echo "stopping docbroker $docbroker ..."
   docbroker=$1
   ./dm_stop_${docbroker}
   echo "docbroker $docbroker stopped"
}

stop_all_docbrokers() {
   echo "stopping the docbrokers ..."
   DOCBROKERS=`ls -1 dm_stop_* 2>/dev/null | cut -f3 -d_`
   nb_items=0
   for docbroker in $DOCBROKERS; do
      stop_docbroker $docbroker
      (( nb_items++ ))
   done
   echo "$nb_items docbrokers stopped"
}

stop_method_server() {
   echo "stopping the method server ..."
   ${DOCUMENTUM}/${JBOSS}/server/stopMethodServer.sh
   echo "method server stopped"
}

stop_all() {
   echo "stopping all the documentum processes ..."
   stop_all_docbases
   stop_method_server
   stop_all_docbrokers
   echo "all documentum processes stopped"
   ps -ajxf | egrep '(PPID|doc|java)' | grep -v grep | sort -n -k2,2
}

# -----------
# main;
# -----------
   [[ -f ${DM_HOME}/bin/dm_set_server_env.sh ]] && . ${DM_HOME}/bin/dm_set_server_env.sh
   cd ${DOCUMENTUM}/dba
   if [[ $# -eq 0 ]]; then
      Usage
      exit 0
   else
      while [[ $# -ge 1 ]]; do
         case $1 in
	    help)
	       Usage
	       exit 0
	    ;;
            start|stop|status)
	       cmd=$1
	       shift
	       if [[ -z $1 || $1 = "all" ]]; then
	          ${cmd}_all
	       elif [[ $1 = "docbases" ]]; then
	          ${cmd}_all_docbases
	       elif [[ $1 = "docbrokers" ]]; then
	          ${cmd}_all_docbrokers
	       elif [[ ${1%%=*} = "docbase" ]]; then
	          docbases=`echo ${1##*=} | gawk '{gsub(/,/, " "); print}'`
                  for docbase in $docbases; do
	             ${cmd}_docbase $docbase
	          done
	       elif [[ ${1%%=*} = "docbroker" ]]; then
	          docbrokers=`echo ${1##*=} | gawk '{gsub(/,/, " "); print}'`
                  for docbroker in $docbrokers; do
	             ${cmd}_docbroker $docbroker
	          done
	       elif [[ $1 = "method_server" ]]; then
                  ${cmd}_method_server
               fi
               exit $general_status
            ;;
            *)
               echo "syntax error"
	       Usage
	       exit 1
	    ;;
         esac
         shift
      done
   fi

See lines 11 to 29 for its usage.
Note on line 110 the timeout command when attempting to connect to a docbase to check its status; see the article Adding a timeout in monitoring probes for an explanation.
We couldn’t help but adding the option to address each component individually, or a few of them, in addition to all of them at once. So, the script lets us stop, start and inquire the status of one particular docbroker or docbase or method server, or a list of docbrokers or a list of docbases, or everything at once.
After a maintenance task, to stop all the Documentum processes, the command below could be used:

$ start-stop.sh stop all

Similarly, to start everything:

$ start-stop.sh start all

Thus, the proper order is guaranteed to be used and human error is prevented. By standardizing on such script and using it as shown, the aforementioned problem won’t occur anymore.

That is fine but if we didn’t use the script and find ourselves in the situation where no docbroker is running and we must shut down the repositories, is there a way to do it easily and cleanly ? Well, easily, certainly, but cleanly, no. Please, continue reading on Part II.

Cet article A Ruthless Repository Shutdown Utility, Part I est apparu en premier sur Blog dbi services.

Datapump Import Partitioned Tables ORA-00600 qesmaGetPamR-NullCtx

Bobby Durrett's DBA Blog - Wed, 2019-12-18 10:28

I have not yet had time to build a test case and prove this out, but I wanted to document one last bug that we found so far in our 11.2.0.4 to 19c upgrade. We tried copying a bunch of partitioned tables on our source database to the new one using Datapump Import (impdp) over a database link. We got a boatload of errors like this:

ORA-00600: internal error code, arguments: [qesmaGetPamR-NullCtx], 

There are many Oracle bugs like this, but they seem to have been fixed in 11.2.0.4. For example:

Bug 12591399 – ORA-600[qesmagetpamr-nullctx] / ORA-14091 with distributed query with local partition table (Doc ID 12591399.8)

Puzzling. We ended up just exporting to disk and that has worked well so no big deal, but I wonder if this is some sort of recession of a fixed bug.

Anyway, I am off for the rest of the year. This should be my last post unless I mess with Nethack over vacation and post something about that. I hope everyone out there has a good new year.

Bobby

P.S. Created a simple partitioned table with 2 partitions and 100 rows in each one. I got the error importing over a link from 11.2.0.4 to 19c. It worked perfectly going from 11.2.0.4 to 11.2.0.4. Same source table. Parfile:

$ cat bobby_link_test.par
userid=MYUSER/MYPASSWORD
JOB_NAME=BOBBY_TEST
DIRECTORY=BOBBY_DIR
NETWORK_LINK=MYLINK
LOGFILE=bobby_link_test.log
tables=TEST

Table:

CREATE TABLE test
(
  PART_COL              NUMBER,
  data                  NUMBER
)
PARTITION BY RANGE (PART_COL)
(  
  PARTITION PART_COL_1 VALUES LESS THAN (100),  
  PARTITION PART_COL_2 VALUES LESS THAN (200)
)
;

PPS. Works fine going from 11.2.0.4 to 18c. Going to try a different 19c database just to be sure it isn’t the one that has the problem.

PPPS. Definitely a 19c bug. It fails on two different 19c databases but not on 18c. In every case source is same 11.2.0.4 database and same small partitioned table. Does anyone have time to file the bug report?

Categories: DBA Blogs

Wait for Java

Jonathan Lewis - Wed, 2019-12-18 03:59

This is a note courtesy of Jack can Zanen on the Oracle-L list server who asked a question about “wait for CPU” and then produced the answer a couple of days later. It’s a simple demonstration of how Java in the database can be very deceptive in terms of indicating CPU usage that isn’t really CPU usage.

Bottom line – when you call Java Oracle knows you’re about to start doing some work on the CPU, but once you’re inside the java engine Oracle has no way of knowing whether the java code is on the CPU or waiting. So if the java starts to wait (e.g. for some slow file I/O) Oracle will still be reporting your session as using CPU.

To demonstrate the principle, I’m going to create little java procedure that simply goes to sleep – and see what I find in the active session history (ASH) after I’ve been sleeping in java for 10 seconds.

rem
rem     Script:         java_wait_for_cpu.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem
rem     Based on an email from Jack van Zanen to Oracle-L
rem

set time on

create or replace procedure milli_sleep(i_milliseconds in number) 
as 
        language java
        name 'java.lang.Thread.sleep(int)';
/

set pagesize 60
set linesize 132
set trimspool on

column sample_time format a32
column event       format a32
column sql_text    format a60
column sql_id      new_value m_sql_id

set echo on
execute milli_sleep(1e4)

select 
        sample_time, sample_id, session_state, sql_id, event 
from 
        v$active_session_history
where 
        session_id = sys_context('userenv','sid')
and     sample_time > sysdate - 1/1440 
order by 
        sample_time
;

select sql_id, round(cpu_time/1e6,3) cpu_time, round(elapsed_time/1e6,3) elapsed, sql_text from v$sql where sql_id = '&m_sql_id';

I’ve set timing on and set echo on so that you can see when my code starts and finishes and correlate it with the report from v$active_session_history for my session. Since I’ve reported the last minute you may find some other stuff reported before the call to milli_sleep() but you should find that you get a report of about 10 seconds “ON CPU” even though your session is really not consuming any CPU at all. I’ve included a report of the SQL that’s “running” while the session is “ON CPU”.

Here (with a little edit to remove the echoed query against v$active_session_history) are the results from a run on 12.2.0.1 (and the run on 19.3.0.0 was very similar):


Procedure created.

18:51:17 SQL> execute milli_sleep(1e4)

PL/SQL procedure successfully completed.

SAMPLE_TIME                       SAMPLE_ID SESSION SQL_ID        EVENT
-------------------------------- ---------- ------- ------------- --------------------------------
16-DEC-19 06.51.11.983 PM          15577837 ON CPU  8r3xn050z2uqm
16-DEC-19 06.51.12.984 PM          15577838 ON CPU  8r3xn050z2uqm
16-DEC-19 06.51.13.985 PM          15577839 ON CPU  8r3xn050z2uqm
16-DEC-19 06.51.14.985 PM          15577840 ON CPU  8r3xn050z2uqm
16-DEC-19 06.51.15.986 PM          15577841 ON CPU  8r3xn050z2uqm
16-DEC-19 06.51.16.996 PM          15577842 ON CPU  8r3xn050z2uqm
16-DEC-19 06.51.17.995 PM          15577843 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.18.999 PM          15577844 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.20.012 PM          15577845 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.21.018 PM          15577846 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.22.019 PM          15577847 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.23.019 PM          15577848 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.24.033 PM          15577849 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.25.039 PM          15577850 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.26.047 PM          15577851 ON CPU  4jt6zf4nybawp
16-DEC-19 06.51.27.058 PM          15577852 ON CPU  4jt6zf4nybawp

16 rows selected.

18:51:27 SQL>
18:51:27 SQL> select sql_id, round(cpu_time/1e6,3) cpu_time, round(elapsed_time/1e6,3) elapsed, sql_text from v$sql where sql_id = '&m_sql_id';

SQL_ID          CPU_TIME    ELAPSED SQL_TEXT
------------- ---------- ---------- ------------------------------------------------------------
4jt6zf4nybawp       .004     10.029 BEGIN milli_sleep(1e4); END;


As you can see I had a statement executing for a few seconds before the call to milli_sleep(), but then we see milli_sleep() “on” the CPU for 10 consecutive samples; but when the sleep ends the query for actual usage shows us that the elapsed time was 10 seconds but the CPU usage was only 4 milliseconds.

 

OT Footnote

I’ve decided to this year to donate to a charity that works to reduce child mortality rates in Nepal with a two-pronged attack on malnutrition: feeding starving children, then educating their parents on how to make best use local resources to grow the most appropriate crops and use the best preparation methods to  produce nourishing meals in the future. (They also run other projects to improve the lives of the young people in Nepal – here’s a link to their home page, and a direct link to a 4 minute video that gives you a quick insight into what they do and how they do it.)

If you’re thinking of making any small donations to charity over the next few weeks, please think of this one. To make your donation more valuable I’ve set up a justgiving page and will match any donations made, up to a total of £1,000.

Thank you.

Spring Boot JPA project riff function demo

Pas Apicella - Tue, 2019-12-17 22:09
riff is an Open Source platform for building and running Functions, Applications, and Containers on Kubernetes. For more information visit the project riff home page https://projectriff.io/

riff supports running containers using Knative serving which in turn provides support for
  •     0-N autoscaling
  •     Revisions
  •     HTTP routing using Istio ingress
Want to try an example? If so head over to the following GitHub project which will show to do this step by step for Spring Data JPA function running using riff on a GKE cluster when required

https://github.com/papicella/SpringDataJPAFunction


More Information

1. Project riff home page
https://projectriff.io/

2. Getting started with riff
https://projectriff.io/docs/v0.5/getting-started

Categories: Fusion Middleware

Oracle Database 19c Automatic Indexing – Indexed Column Reorder (What Shall We Do Now?)

Richard Foote - Tue, 2019-12-17 18:49
  I previously discussed how the default column order of an Automatic Index (in the absence of other factors) is based on the Column ID, the order in which the columns are defined in the table. But what if there are “other factors” based on new workloads and the original index column order is no […]
Categories: DBA Blogs

Pages

Subscribe to Oracle FAQ aggregator