Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 1 week 4 days ago

Securely store passwords in PostgreSQL

Fri, 2019-05-31 00:25

Every application somehow needs to deal with passwords. Some use external authentication methods such as ldap, others us the the framework the database already provides and create users and roles. But it is also not uncommon that applications implement their own concept for managing users. If an application does this it should be done the right way and passwords should never be stored in plain text in the database. PostgreSQL comes with a handy extension that supports you with that.

You might be already aware that PostgreSQL comes with a lot of additional modules by default. One of these modules is pgcrypto and it can be used for the use case described above: En- and decryption of strings so you do not have to implement that on your own. Lets start with a simple table which contains usernames and their passwords:

postgres=# create table app_users ( id int generated always as identity ( cache 10 ) primary key
postgres(#                        , username text not null unique
postgres(#                        , password text not null
postgres(#                        );
CREATE TABLE
postgres=# \d app_users
                         Table "public.app_users"
  Column  |  Type   | Collation | Nullable |           Default            
----------+---------+-----------+----------+------------------------------
 id       | integer |           | not null | generated always as identity
 username | text    |           | not null | 
 password | text    |           | not null | 
Indexes:
    "app_users_pkey" PRIMARY KEY, btree (id)
    "app_users_username_key" UNIQUE CONSTRAINT, btree (username)
postgres=# 

Both, the username and password columns are implement as plain text. If you keep it like that and just insert data the password of course will be stored as plain text. So how can we use pgcrypto to improve that? Obviously the first step is to install the extension:

postgres=# create extension pgcrypto;
CREATE EXTENSION
postgres=# \dx
                   List of installed extensions
    Name    | Version |   Schema   |         Description          
------------+---------+------------+------------------------------
 pg_prewarm | 1.2     | public     | prewarm relation data
 pgcrypto   | 1.3     | public     | cryptographic functions
 plpgsql    | 1.0     | pg_catalog | PL/pgSQL procedural language
(3 rows)

Btw: There is a catalog view which you can use to list all available extensions:

postgres=# \d pg_available_extensions;
         View "pg_catalog.pg_available_extensions"
      Column       | Type | Collation | Nullable | Default 
-------------------+------+-----------+----------+---------
 name              | name |           |          | 
 default_version   | text |           |          | 
 installed_version | text | C         |          | 
 comment           | text |           |          | 

postgres=# select * from pg_available_extensions limit 3;
  name   | default_version | installed_version |                comment                 
---------+-----------------+-------------------+----------------------------------------
 plpgsql | 1.0             | 1.0               | PL/pgSQL procedural language
 plperl  | 1.0             |                   | PL/Perl procedural language
 plperlu | 1.0             |                   | PL/PerlU untrusted procedural language
(3 rows)

The function to use (provided by the pgcrypto module) for encrypting strings is crypt(). This function takes two arguments:

  • The actual string to encrypt
  • The salt to use (a random value) for encrpyption

Adding a user with an encrypted password is as easy as:

postgres=# insert into app_users (username, password) 
postgres-#        values ( 'myuser', crypt('mypassword', gen_salt('bf')) );
INSERT 0 1

In this case we used the Blowfish algorithm to generate the salt. You can also use md5, xdes and des.

When we look at the password for our user we will see that it is not plain text anymore:

postgres=# select password from app_users where username = 'myuser';
                           password                           
--------------------------------------------------------------
 $2a$06$8wu4VWVubv/RBYBSuj.1TOojPm0q7FkRwuDSoW0OTOC6FzBGEslIC
(1 row)

This is for the encryption part. For comparing this encrypted string against the plain text version of the string we use the crypt() function again:

postgres=# select (password = crypt('mypassword', password)) AS pwd_match 
postgres-#   from app_users
postgres-#  where username = 'myuser';
 pwd_match 
-----------
 t
(1 row)

Providing the wrong password of course returns false:

postgres=# select (password = crypt('Xmypassword', password)) AS pwd_match 
  from app_users
 where username = 'myuser';
 pwd_match 
-----------
 f
(1 row)

So finally, it is not much you need to do for storing encrypted strings in PostgreSQL. Just use it.

Cet article Securely store passwords in PostgreSQL est apparu en premier sur Blog dbi services.

Can you start two (or more) PostgreSQL instances against the same data directory?

Thu, 2019-05-30 06:41

As PostgreSQL does not know the concept of running multiple instances against the same files on disk (e.g. like Oracle RAC) it should not be possible to start two or more instances against the same data directory. If that would work the result can only be corruption. In this post we will look at how PostgreSQL is detecting that and what mechanism are build in to avoid the situation of having multiple instances working against the same files on disk.

To start with we create a new cluster:

postgres@rhel8pg:/home/postgres/ [PGDEV] mkdir /var/tmp/pgtest
12:16:46 postgres@rhel8pg:/home/postgres/ [PGDEV] initdb -D /var/tmp/pgtest/
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locales
  COLLATE:  en_US.utf8
  CTYPE:    en_US.utf8
  MESSAGES: en_US.utf8
  MONETARY: de_CH.UTF-8
  NUMERIC:  de_CH.UTF-8
  TIME:     en_US.UTF-8
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/tmp/pgtest ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Europe/Zurich
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /var/tmp/pgtest/ -l logfile start

We use a dedicated port and then start it up:

postgres@rhel8pg:/home/postgres/ [PGDEV] export PGPORT=8888
postgres@rhel8pg:/home/postgres/ [PGDEV] pg_ctl -D /var/tmp/pgtest start
waiting for server to start....2019-05-16 12:17:22.399 CEST [7607] LOG:  starting PostgreSQL 12devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3), 64-bit
2019-05-16 12:17:22.403 CEST [7607] LOG:  listening on IPv6 address "::1", port 8888
2019-05-16 12:17:22.403 CEST [7607] LOG:  listening on IPv4 address "127.0.0.1", port 8888
2019-05-16 12:17:22.409 CEST [7607] LOG:  listening on Unix socket "/tmp/.s.PGSQL.8888"
2019-05-16 12:17:22.446 CEST [7608] LOG:  database system was shut down at 2019-05-16 12:16:54 CEST
2019-05-16 12:17:22.455 CEST [7607] LOG:  database system is ready to accept connections
 done
server started

postgres@rhel8pg:/home/postgres/ [PGDEV] psql -p 8888 -c "select version()" postgres
                                                  version                                                  
-----------------------------------------------------------------------------------------------------------
 PostgreSQL 12devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3), 64-bit
(1 row)

What happens when we want to start another instance against that data directory?

postgres@rhel8pg:/home/postgres/ [PGDEV] pg_ctl -D /var/tmp/pgtest start
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2019-05-16 12:18:26.252 CEST [7629] FATAL:  lock file "postmaster.pid" already exists
2019-05-16 12:18:26.252 CEST [7629] HINT:  Is another postmaster (PID 7607) running in data directory "/var/tmp/pgtest"?
 stopped waiting
pg_ctl: could not start server
Examine the log output.

When PostgreSQL is starting up it will look at a file called “postmaster.pid” which exists in the data directory once the instance is started. If that file exists PostgreSQL will not start up another instance against the same data directory. Once the instance is stopped the file is removed:

postgres@rhel8pg:/home/postgres/ [PGDEV] pg_ctl -D /var/tmp/pgtest/ stop
waiting for server to shut down....2019-05-16 12:48:50.636 CEST [7896] LOG:  received fast shutdown request
2019-05-16 12:48:50.641 CEST [7896] LOG:  aborting any active transactions
2019-05-16 12:48:50.651 CEST [7896] LOG:  background worker "logical replication launcher" (PID 7903) exited with exit code 1
2019-05-16 12:48:50.651 CEST [7898] LOG:  shutting down
2019-05-16 12:48:50.685 CEST [7896] LOG:  database system is shut down
 done
server stopped
postgres@rhel8pg:/home/postgres/ [PGDEV] ls -al /var/tmp/pgtest/postmaster.pid
ls: cannot access '/var/tmp/pgtest/postmaster.pid': No such file or directory

At least by default this is not possible to start two or more instances as PostgreSQL checks if postmaster.pid already exists. Lets remove that file and try again:

postgres@rhel8pg:/home/postgres/ [PGDEV] rm /var/tmp/pgtest/postmaster.pid
postgres@rhel8pg:/home/postgres/ [PGDEV] pg_ctl -D /var/tmp/pgtest start
waiting for server to start....2019-05-16 12:20:17.754 CEST [7662] LOG:  starting PostgreSQL 12devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3), 64-bit
2019-05-16 12:20:17.756 CEST [7662] LOG:  could not bind IPv6 address "::1": Address already in use
2019-05-16 12:20:17.756 CEST [7662] HINT:  Is another postmaster already running on port 8888? If not, wait a few seconds and retry.
2019-05-16 12:20:17.756 CEST [7662] LOG:  could not bind IPv4 address "127.0.0.1": Address already in use
2019-05-16 12:20:17.756 CEST [7662] HINT:  Is another postmaster already running on port 8888? If not, wait a few seconds and retry.
2019-05-16 12:20:17.756 CEST [7662] WARNING:  could not create listen socket for "localhost"
2019-05-16 12:20:17.756 CEST [7662] FATAL:  could not create any TCP/IP sockets
2019-05-16 12:20:17.756 CEST [7662] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.

Again, this does not work and even the initial instance was shutdown because PostgreSQL detected that the lock file is not there anymore:

2019-05-16 12:20:22.540 CEST [7607] LOG:  could not open file "postmaster.pid": No such file or directory
2019-05-16 12:20:22.540 CEST [7607] LOG:  performing immediate shutdown because data directory lock file is invalid
2019-05-16 12:20:22.540 CEST [7607] LOG:  received immediate shutdown request
2019-05-16 12:20:22.540 CEST [7607] LOG:  could not open file "postmaster.pid": No such file or directory
2019-05-16 12:20:22.544 CEST [7612] WARNING:  terminating connection because of crash of another server process
2019-05-16 12:20:22.544 CEST [7612] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2019-05-16 12:20:22.544 CEST [7612] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2019-05-16 12:20:22.549 CEST [7664] WARNING:  terminating connection because of crash of another server process
2019-05-16 12:20:22.549 CEST [7664] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

Lets start the first instance again:

postgres@rhel8pg:/home/postgres/ [PGDEV] pg_ctl -D /var/tmp/pgtest/ start
waiting for server to start....2019-05-16 12:22:20.136 CEST [7691] LOG:  starting PostgreSQL 12devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3), 64-bit
2019-05-16 12:22:20.140 CEST [7691] LOG:  listening on IPv6 address "::1", port 8888
2019-05-16 12:22:20.140 CEST [7691] LOG:  listening on IPv4 address "127.0.0.1", port 8888
2019-05-16 12:22:20.148 CEST [7691] LOG:  listening on Unix socket "/tmp/.s.PGSQL.8888"
2019-05-16 12:22:20.193 CEST [7693] LOG:  database system was interrupted; last known up at 2019-05-16 12:17:22 CEST
.2019-05-16 12:22:21.138 CEST [7693] LOG:  database system was not properly shut down; automatic recovery in progress
2019-05-16 12:22:21.143 CEST [7693] LOG:  redo starts at 0/15D3420
2019-05-16 12:22:21.143 CEST [7693] LOG:  invalid record length at 0/15D3458: wanted 24, got 0
2019-05-16 12:22:21.143 CEST [7693] LOG:  redo done at 0/15D3420
2019-05-16 12:22:21.173 CEST [7691] LOG:  database system is ready to accept connections
 done
server started

postgres@rhel8pg:/home/postgres/ [PGDEV] psql -p 8888 -c "select version()" postgres
                                                  version                                                  
-----------------------------------------------------------------------------------------------------------
 PostgreSQL 12devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3), 64-bit
(1 row)

Lets change the port for the second instance and then try again to start it against the same data directory:

postgres@rhel8pg:/home/postgres/ [PGDEV] export PGPORT=8889
postgres@rhel8pg:/home/postgres/ [PGDEV] pg_ctl -D /var/tmp/pgtest/ start
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2019-05-16 12:24:41.700 CEST [7754] FATAL:  lock file "postmaster.pid" already exists
2019-05-16 12:24:41.700 CEST [7754] HINT:  Is another postmaster (PID 7741) running in data directory "/var/tmp/pgtest"?
 stopped waiting
pg_ctl: could not start server
Examine the log output.

Does not work as well, which is good. Lets be a bit more nasty and truncate the postmaster.pid file:

postgres@rhel8pg:/home/postgres/ [PGDEV] cat /var/tmp/pgtest/postmaster.pid 
7790
/var/tmp/pgtest
1558002434
8888
/tmp
localhost
  8888001    819201
ready   
postgres@rhel8pg:/home/postgres/ [PGDEV] cat /dev/null > /var/tmp/pgtest/postmaster.pid
postgres@rhel8pg:/home/postgres/ [PGDEV] cat /var/tmp/pgtest/postmaster.pid

The pid file is now empty and right after emptying that file we can see this in the PostgreSQL log file:

019-05-16 12:30:14.140 CEST [7790] LOG:  lock file "postmaster.pid" contains wrong PID: 0 instead of 7790
2019-05-16 12:30:14.140 CEST [7790] LOG:  performing immediate shutdown because data directory lock file is invalid
2019-05-16 12:30:14.140 CEST [7790] LOG:  received immediate shutdown request
2019-05-16 12:30:14.149 CEST [7795] WARNING:  terminating connection because of crash of another server process
2019-05-16 12:30:14.149 CEST [7795] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2019-05-16 12:30:14.149 CEST [7795] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2019-05-16 12:30:14.160 CEST [7790] LOG:  database system is shut down

So even that case it is detected and PostgreSQL protects you from starting up another instance against the same data directory. Lets try something else and modify PGDATA in the postmaster.pid file:

postgres@rhel8pg:/home/postgres/ [PGDEV] cat /var/tmp/pgtest/postmaster.pid 
7896
/var/tmp/pgtest
1558002751
8888
/tmp
localhost
  8888001    851969
ready   

postgres@rhel8pg:/home/postgres/ [PGDEV] sed -i  's/\/var\/tmp\/pgtest/\/var\/tmp\/pgtest2/g' /var/tmp/pgtest/postmaster.pid 
postgres@rhel8pg:/home/postgres/ [PGDEV] cat /var/tmp/pgtest/postmaster.pid
7896
/var/tmp/pgtest2
1558002751
8888
/tmp
localhost
  8888001    851969
ready   

Although we changed PGDATA PostgreSQL will not start up another instance against this data directory:

postgres@rhel8pg:/home/postgres/ [PGDEV] pg_ctl -D /var/tmp/pgtest/ start
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2019-05-16 12:35:28.540 CEST [7973] FATAL:  lock file "postmaster.pid" already exists
2019-05-16 12:35:28.540 CEST [7973] HINT:  Is another postmaster (PID 7896) running in data directory "/var/tmp/pgtest"?
 stopped waiting
pg_ctl: could not start server
Examine the log output.

So by default you can not get PostgreSQL to start two or even more instances against the same data directory. There is an comment about this behaviour in src/backend/postmaster/postmaster.c in the source code:

                /*
                 * Once a minute, verify that postmaster.pid hasn't been removed or
                 * overwritten.  If it has, we force a shutdown.  This avoids having
                 * postmasters and child processes hanging around after their database
                 * is gone, and maybe causing problems if a new database cluster is
                 * created in the same place.  It also provides some protection
                 * against a DBA foolishly removing postmaster.pid and manually
                 * starting a new postmaster.  Data corruption is likely to ensue from
                 * that anyway, but we can minimize the damage by aborting ASAP.
                 */

“Once a minute” might be critical and we might be able to start a second one if we are fast enough, so lets try again. This time we start the first one, remove the lock file and immediately start another one using another port:

export PGPORT=8888
pg_ctl -D /var/tmp/pgtest start
rm -f /var/tmp/pgtest/postmaster.pid
export PGPORT=8889
pg_ctl -D /var/tmp/pgtest start

And here you have it:

postgres@rhel8pg:/home/postgres/ [pg120] ps -ef | grep postgres
postgres  1445     1  0 May27 ?        00:00:00 /usr/lib/systemd/systemd --user
postgres  1456  1445  0 May27 ?        00:00:00 (sd-pam)
root      9780   786  0 06:09 ?        00:00:00 sshd: postgres [priv]
postgres  9783  9780  0 06:09 ?        00:00:00 sshd: postgres@pts/1
postgres  9784  9783  0 06:09 pts/1    00:00:00 -bash
postgres 10302     1  0 06:19 ?        00:00:00 /u01/app/postgres/product/DEV/db_1/bin/postgres -D /var/tmp/pgtest
postgres 10304 10302  0 06:19 ?        00:00:00 postgres: checkpointer   
postgres 10305 10302  0 06:19 ?        00:00:00 postgres: background writer   
postgres 10306 10302  0 06:19 ?        00:00:00 postgres: walwriter   
postgres 10307 10302  0 06:19 ?        00:00:00 postgres: autovacuum launcher   
postgres 10308 10302  0 06:19 ?        00:00:00 postgres: stats collector   
postgres 10309 10302  0 06:19 ?        00:00:00 postgres: logical replication launcher   
postgres 10313     1  0 06:19 ?        00:00:00 /u01/app/postgres/product/DEV/db_1/bin/postgres -D /var/tmp/pgtest
postgres 10315 10313  0 06:19 ?        00:00:00 postgres: checkpointer   
postgres 10316 10313  0 06:19 ?        00:00:00 postgres: background writer   
postgres 10317 10313  0 06:19 ?        00:00:00 postgres: walwriter   
postgres 10318 10313  0 06:19 ?        00:00:00 postgres: autovacuum launcher   
postgres 10319 10313  0 06:19 ?        00:00:00 postgres: stats collector   
postgres 10320 10313  0 06:19 ?        00:00:00 postgres: logical replication launcher   
postgres 10327  9784  0 06:19 pts/1    00:00:00 ps -ef

Conclusion: PostgreSQL does some basic checks to avoid starting two instances against the same files on disk. But if you really want (and of course you should never do that) then you can achieve that => with all the consequences! Don’t do it!

Cet article Can you start two (or more) PostgreSQL instances against the same data directory? est apparu en premier sur Blog dbi services.

Ton nuage n’attends que toi

Wed, 2019-05-29 05:01

Après plusieurs formations offertes par dbi services chez différents fournisseurs de service Cloud et plusieurs retours d’expériences, je suis enthousiaste quand à l’évolution des technologies du numérique. Azure, Amazon AWS et Oracle OCI proposent d’excellents services d’infrastructure dont les noms changes mais les concepts sont les mêmes. On retrouve aussi quelques différences entre les services Cloud infrastructure qui peuvent être intéressants.

Peu être que toi aussi tu es intéressé par le Cloud mais tout ça n’est pas clair. A ça je réponds de voir le Cloud plus comme un data center traditionnel mais au lieu de provisionner tes machines et tes connexions à la main dans ta salle sur-climatisée, tout se fait maintenant que quelque cliques (ou lignes de commandes).

Permet moi de te présenter quelques informations sur le Cloud OCI d’Oracle qui lève les doutes sur les rumeurs et permette d’aller de l’avant.

Serveurs

Il est possible d’avoir des machines physiques et/ou des machines virtuelles. Ces machines physiques et virtuelles sont ensuite déclinées en “shapes” (template):

  • Standard shapes: Un standard pour la plus part des applications
  • DenseIO shapes: Fait pour des charges de travail élevée sur les données
  • GPU shapes: Fait pour utiliser des processeur de carte graphiques (Calcul de Bitcoins, algorthym pour gagner au jeu de Go,…)
  • High performance computing (HPC) shapes: Pour machine physique seullement. Fait pour des besoins massif de calcul CPU en parallele

Temps de provisionnement: quelque minutes

Réseau

Les possibilités de connexion sont sans limite. Il est notamment possible de:

  • Connecter ton Infrastructure Cloud avec Internet et/ou ton data center
  • Il est possible de connecter le Cloud Oracle avec un autre Cloud (Amazon AWS par example)
  • Il est possible de connecter ton espace Cloud avec ton infrastructure locale avec un lien “FastConnect” offrant une connexion avec une bande passante d’au minimum 10Gb.

Ces options de connexion permettent de répondre à tous les scénarios et évitent d’être bloqué chez Oracle. Tout comme il est possible de convertir ton espace Cloud en extension de ton Data center grâce au “FastConnect”. Je comprends mieux avec un schéma:

Le service base de données autonome (Autonomous Database)

Le service base de données autonome sert 2 types de charges de travail:

  • La charge de travail applicative standard (courte requêtes très fréquentes)
  • La charge de travail pour les entrepôts de données (long traitements unitaires)

Avec une base de données autonome, on ne fait que configurer le nombre de CPUs et la capacité de stockage sans arrêt de service ou dégradation de performance. De plus toutes les opérations suivantes sont prises en charge automatiquement:

  • Création de la base de données
  • Sauvegarde de la base de données
  • Mise à jour des fonctionnalité de la base de données
  • Mise à jour des failles de sécurité et bug de la base de données
  • Optimisation de la base de données

Il est possible de garder ses licences existantes pour le Cloud ou d’utiliser les licences inclues dans le service.

Pays disponibles (pour l’instant)

Liste des pays disponibles pour votre Cloud: https://docs.cloud.oracle.com/iaas/Content/General/Concepts/regions.htm

  • Royaume unis
  • Allemagne
  • Canada
  • États-Unis
  • Japon
  • Corée du Sud
  • Suisse (septembre 2019)
Prix

Tu peux payer au fur et à mesure de ta consommation ou par un abonnement. Tu peux comparer les prix toi même grâce aux calculettes onlines fournies par les fournisseurs:

  • Microsoft Azure: https://azure.microsoft.com/en-in/pricing/calculator
  • Oracle Cloud: https://cloud.oracle.com/en_US/cost-estimator
  • Amazon AWS: https://calculator.s3.amazonaws.com/index.html

Par exemple pour un serveur virtuel à 8 CPUs et environs 120GB de mémoire tu paieras mensuellement:

  • Microsoft Azure (DS13v2): 648.97 $
  • Oracle Cloud (B88514): 401.00 $
  • Amazon AWS (f1.x2large): 1207.00 $
Conclusion

Si les autres fournisseurs de Cloud étaient loin devant Oracle depuis quelques années, Oracle offre maintenant un service infrastructure (Oracle OCI) solide et facile à comprendre (comme les autres). L’avantage par rapport à d’autre est qu’il est possible d’avoir des machines physiques. Et enfin, nous ne sommes pas lié à Oracle grâce à l’option des réseaux privés ou public entre le cloud Oracle et d’autres fournisseurs de Cloud.

Enfin, même si avoir du cloud te débarrasse de la gestion matériel et fait passé ton temps de provisionnement à quelques minutes, cela n’empêchera pas que tu auras besoin d’ingénieurs système pour gérer les accès, les ressources, l’opérationnel, les sauvegardes et ton plan de reprise d’activité.

J’ai oublié un point? N’hésite pas à me laisser ta question ci-dessous et j’y répondrai avec plaisir.

Cet article Ton nuage n’attends que toi est apparu en premier sur Blog dbi services.

SUSE Expert Day Zürich

Tue, 2019-05-28 01:44

On May 16th I visited the SUSE Expert Day in Zürich.
An interesting Agenda was waiting for me, all under the topic: “My kind of open”

After a small welcome cafe, SUSE started with the Keynote of the Markus Wolf (Country Manager and Sales Director ALPS Region). After a short introduction of the Swiss SUSE Team, he talked about the IT Transformation and his vision of the next years in IT – nice to hear, that IT is getting even more complex as it is now.
One slide that really impressed me:

Amazing, isn’t it?

As a customer story, Nicolas Christener, CEO and CTO of the adfinis sygroup showed with an impressive example, what you can reach with the SUSE Cloud Application Platform and what matters for the end customer. He also mentioned the great collaboration with SUSE during the project. I think that’s really nice to know that you get the help and support of SUSE that is needed, especially in new pioneer projects.

As third speaker Bo Jin (Sales Engineer and Consultant at SUSE ) was on stage. Really impressive knowledge! He told a lot about CloundFoundry as well as a lot about Kubernetes, Cloud Application Platform, CaaS. A highlight for me was his really impressive demo about pushing code to Clound Goundry and how to deploy from GitHub into a container. Everything seems to be really easy to manage.

Last but not least we got some insight to the SUSE Manager, how it could help you to centralize the system administration, the patch handling as well as autoyast profiles and kickstart files. This tool is suitable for Ubuntu, CentOS, Red Hat and, of course, SUSE servers. Everything centrally handled for almost all distributions. That makes life much easier.
Bo Jin also showed us the kernel live patching in a demo and gave us some background information. Did you for example know, that even if you have Kernel Live Patching enabled, you have to reboot at least once in 12 month?

In a nutshell – nice to see how pationate and innovative SUSE is, they presented great tools. Even they were only able to show us the tools mostly in scope at the moment – can’t wait to test them!

Cet article SUSE Expert Day Zürich est apparu en premier sur Blog dbi services.

Configuring Oracle DB data source in JBoss EAP 7.1

Sun, 2019-05-26 02:05

Introduction

This blog explains how to install and use an Oracle database JDBC driver in JBoss EAP 7.1 standalone instance and in a domain deployment.

Oracle JDBC driver installation
The first step is to install the JDBC driver in the JBoss installation. This can be done copying the files to the right directory or using the JBoss CLI to do the install properly.
I will use the JBoss CLI script for this.

Start the JBoss CLI without connecting to any JBoss instance.

/opt/jboss-eap-7.1/bin/jboss-cli.sh

Then use the module add CLI command to install the Oracle JDBC driver to the right place.

module add --name=com.oracle --resources=/home/pascal/jdbc_drivers/ojdbc8.jar --dependencies=javax.api,javax.transaction.api

This will place the driver in the following directory:

$JBOSS_HOME/modules/com/oracle/main

Note: This CLI command has to be run on each host participating to a domain deployment.
Once the module is installed, it can be use to declare the JDBC driver inside the JBoss instance

Create a Data-Source

For a standalone instance using the default profile:
a. Start jboss_cli.sh to connect to the standalone server and declare the JDBC driver in the Jboss instance

/opt/jboss-eap-7.1/bin/jboss-cli.sh -c --controller=192.168.56.21:9990
[standalone@192.168.56.21:9990 /] /subsystem=datasources/jdbc-driver=oracle:add(driver-name=oracle,driver-module-name=com.oracle,driver-xa-datasource-class-name=oracle.jdbc.driver.OracleDriver)
{"outcome" => "success"}
[standalone@192.168.56.21:9990 /]

b. Confirm the JDBC driver has been declared successfully.

[standalone@192.168.56.21:9990 /] /subsystem=datasources/jdbc-driver=oracle:read-resource
{
    "outcome" => "success",
    "result" => {
        "deployment-name" => undefined,
        "driver-class-name" => undefined,
        "driver-datasource-class-name" => undefined,
        "driver-major-version" => undefined,
        "driver-minor-version" => undefined,
        "driver-module-name" => "com.oracle",
        "driver-name" => "oracle",
        "driver-xa-datasource-class-name" => "oracle.jdbc.driver.OracleDriver",
        "jdbc-compliant" => undefined,
        "module-slot" => undefined,
        "profile" => undefined,
        "xa-datasource-class" => undefined
    }
}

c. Create the data-source pointing to the Oracle Database

[standalone@192.168.56.21:9990 /] data-source add --name=testOracleDS --jndi-name=java:/jdbc/testOracleDS --driver-name=oracle --connection-url=jdbc:oracle:thin:@vm12:1521/orcl --user-name=scott --password=tiger --jta=true --use-ccm=true --use-java-context=true --enabled=true --user-name=scott --password=tiger --max-pool-size=10 --min-pool-size=5 --flush-strategy="FailingConnectionOnly"

d. Confirm the Datasource creation and parameters.

[standalone@192.168.56.21:9990 /] /subsystem=datasources/data-source=testOracleDS:read-resource
{
    "outcome" => "success",
    "result" => {
        "allocation-retry" => undefined,
        "allocation-retry-wait-millis" => undefined,
        "allow-multiple-users" => false,
        "authentication-context" => undefined,
        "background-validation" => undefined,
        "background-validation-millis" => undefined,
        "blocking-timeout-wait-millis" => undefined,
        "capacity-decrementer-class" => undefined,
        "capacity-decrementer-properties" => undefined,
        "capacity-incrementer-class" => undefined,
        "capacity-incrementer-properties" => undefined,
        "check-valid-connection-sql" => undefined,
        "connectable" => false,
        "connection-listener-class" => undefined,
        "connection-listener-property" => undefined,
        "connection-url" => "jdbc:oracle:thin:@vm12:1521/orcl",
        "credential-reference" => undefined,
        "datasource-class" => undefined,
        "driver-class" => undefined,
        "driver-name" => "oracle",
        "elytron-enabled" => false,
        "enabled" => true,
        "enlistment-trace" => false,
        "exception-sorter-class-name" => undefined,
        "exception-sorter-properties" => undefined,
        "flush-strategy" => "FailingConnectionOnly",
        "idle-timeout-minutes" => undefined,
        "initial-pool-size" => undefined,
        "jndi-name" => "java:/jdbc/testOracleDS",
        "jta" => true,
        "max-pool-size" => 10,
        "mcp" => "org.jboss.jca.core.connectionmanager.pool.mcp.SemaphoreConcurrentLinkedDequeManagedConnectionPool",
        "min-pool-size" => 5,
        "new-connection-sql" => undefined,
        "password" => "tiger",
        "pool-fair" => undefined,
        "pool-prefill" => undefined,
        "pool-use-strict-min" => undefined,
        "prepared-statements-cache-size" => undefined,
        "query-timeout" => undefined,
        "reauth-plugin-class-name" => undefined,
        "reauth-plugin-properties" => undefined,
        "security-domain" => undefined,
        "set-tx-query-timeout" => false,
        "share-prepared-statements" => false,
        "spy" => false,
        "stale-connection-checker-class-name" => undefined,
        "stale-connection-checker-properties" => undefined,
        "statistics-enabled" => false,
        "track-statements" => "NOWARN",
        "tracking" => false,
        "transaction-isolation" => undefined,
        "url-delimiter" => undefined,
        "url-selector-strategy-class-name" => undefined,
        "use-ccm" => true,
        "use-fast-fail" => false,
        "use-java-context" => true,
        "use-try-lock" => undefined,
        "user-name" => "scott",
        "valid-connection-checker-class-name" => undefined,
        "valid-connection-checker-properties" => undefined,
        "validate-on-match" => undefined,
        "connection-properties" => undefined,
        "statistics" => {
            "jdbc" => undefined,
            "pool" => undefined
        }
    }
}

At this stage, the data-source is available to all applications deployed in the standalone server. I started with the standalone.xml default profile configuration file. To have additional subsystems, an other standalone profile configuration file should be used.
For a JBoss domain using the Full-ha profile
The domain I’m using in my tests is having a domain controller and two slave Hosts running two servers organized in two server groups.
a. Start jboss_cli.sh to connect to the domain master

/opt/jboss-eap-7.1/bin/jboss-cli.sh -c --controller=192.168.56.21:9990

b. Register the Oracle JDBC driver

[domain@192.168.56.21:9990 /] /profile=full-ha/subsystem=datasources/jdbc-driver=oracle:add(driver-name=oracle,driver-module-name=com.oracle,driver-xa-datasource-class-name=oracle.jdbc.driver.OracleDriver)
{
    "outcome" => "success",
    "result" => undefined,
    "server-groups" => {"Group2" => {"host" => {
        "host1" => {"server-two" => {"response" => {
            "outcome" => "success",
            "result" => undefined
        }}},
        "host2" => {"server-four" => {"response" => {
            "outcome" => "success",
            "result" => undefined
        }}}
    }}}
}
[domain@192.168.56.21:9990 /]

In the JBoss domain I used for the testing, the full-ha profile has been used when creating the Group2 Servers group.
c. Confirm the JDBC driver has been declared successfully.

[domain@192.168.56.21:9990 /] /profile=full-ha/subsystem=datasources/jdbc-driver=oracle:read-resource
{
    "outcome" => "success",
    "result" => {
        "deployment-name" => undefined,
        "driver-class-name" => undefined,
        "driver-datasource-class-name" => undefined,
        "driver-major-version" => undefined,
        "driver-minor-version" => undefined,
        "driver-module-name" => "com.oracle",
        "driver-name" => "oracle",
        "driver-xa-datasource-class-name" => "oracle.jdbc.driver.OracleDriver",
        "jdbc-compliant" => undefined,
        "module-slot" => undefined,
        "profile" => undefined,
        "xa-datasource-class" => undefined
    }
}
[domain@192.168.56.21:9990 /]

d. Create the Data source pointing to the Oracle Database

[domain@192.168.56.21:9990 /] data-source add --profile=full-ha --name=testOracleDS --jndi-name=java:/jdbc/testOracleDS --driver-name=oracle --connection-url=jdbc:oracle:thin:@vm12:1521/orcl --user-name=scott --password=tiger --jta=true --use-ccm=true --use-java-context=true --enabled=true --user-name=scott --password=tiger --max-pool-size=10 --min-pool-size=5 --flush-strategy="FailingConnectionOnly"

e. Confirm the data source has been create correctly

[domain@192.168.56.21:9990 /] /profile=full-ha/subsystem=datasources/data-source=testOracleDS:read-resource
{
    "outcome" => "success",
    "result" => {
        "allocation-retry" => undefined,
        "allocation-retry-wait-millis" => undefined,
        "allow-multiple-users" => false,
        "authentication-context" => undefined,
        "background-validation" => undefined,
        "background-validation-millis" => undefined,
        "blocking-timeout-wait-millis" => undefined,
        "capacity-decrementer-class" => undefined,
        "capacity-decrementer-properties" => undefined,
        "capacity-incrementer-class" => undefined,
        "capacity-incrementer-properties" => undefined,
        "check-valid-connection-sql" => undefined,
        "connectable" => false,
        "connection-listener-class" => undefined,
        "connection-listener-property" => undefined,
        "connection-url" => "jdbc:oracle:thin:@vm12:1521/orcl",
        "credential-reference" => undefined,
        "datasource-class" => undefined,
        "driver-class" => undefined,
        "driver-name" => "oracle",
        "elytron-enabled" => false,
        "enabled" => true,
        "enlistment-trace" => false,
        "exception-sorter-class-name" => undefined,
        "exception-sorter-properties" => undefined,
        "flush-strategy" => "FailingConnectionOnly",
        "idle-timeout-minutes" => undefined,
        "initial-pool-size" => undefined,
        "jndi-name" => "java:/jdbc/testOracleDS",
        "jta" => true,
        "max-pool-size" => 10,
        "mcp" => "org.jboss.jca.core.connectionmanager.pool.mcp.SemaphoreConcurrentLinkedDequeManagedConnectionPool",
        "min-pool-size" => 5,
        "new-connection-sql" => undefined,
        "password" => "tiger",
        "pool-fair" => undefined,
        "pool-prefill" => undefined,
        "pool-use-strict-min" => undefined,
        "prepared-statements-cache-size" => undefined,
        "query-timeout" => undefined,
        "reauth-plugin-class-name" => undefined,
        "reauth-plugin-properties" => undefined,
        "security-domain" => undefined,
        "set-tx-query-timeout" => false,
        "share-prepared-statements" => false,
        "spy" => false,
        "stale-connection-checker-class-name" => undefined,
        "stale-connection-checker-properties" => undefined,
        "statistics-enabled" => false,
        "track-statements" => "NOWARN",
        "tracking" => false,
        "transaction-isolation" => undefined,
        "url-delimiter" => undefined,
        "url-selector-strategy-class-name" => undefined,
        "use-ccm" => true,
        "use-fast-fail" => false,
        "use-java-context" => true,
        "use-try-lock" => undefined,
        "user-name" => "scott",
        "valid-connection-checker-class-name" => undefined,
        "valid-connection-checker-properties" => undefined,
        "validate-on-match" => undefined,
        "connection-properties" => undefined
    }
}
[domain@192.168.56.21:9990 /]

At this stage, all servers in the Server group Group2 have been targeted with the data-source and all applications deployed to those servers can use it.
As the data-source has been targeted to the profile, all server groups created with this profile will allow their JBoss servers instances to use it.

Cet article Configuring Oracle DB data source in JBoss EAP 7.1 est apparu en premier sur Blog dbi services.

APEX Connect 2019 – Day 3

Thu, 2019-05-09 11:20

For the last conference day, after the Keynote about “JavaScript, Why Should I Care?” by Dan McGhan , I decided to attend some JavaScript learning sessions to improve myself and presentations on following topics:
– How to hack your APEX App… (only for testing)
– What you need to know about APEX validations

I also got the chance to have a 1:1 talk with Anthony Rayner to expose some wishes about APEX and talk about some issue on interactive grid search.

JavaScript programming language

Nowadays being an good APEX developer means being a full stack developer who master different areas:
– Server side (database, data modeling, SQL, PL/SQL)
– Client side (HTML, CSS, JavaScript)
So, even JavaScript was weird from the beginning you cannot avoid learning and mastering it. It’s simply the number 1 most used programming language (thanks to the web). Think APEX Dynamic Actions can solve all issues by hiding the complexity of Java Script just isn’t always possible anymore. Some statistics show that APEX team is already putting a lot of effort into JavaScript as it is more than 50% of APEX code way ahead from PL/SQL.
A couple of characteristics about JavaScript:
– It’s a no variable type language, meaning that the type is not in the variable but rather in the value assigned to it. This can some how be seen as polymorphism.
– It’s case sensitive
– 0 based array index (PL/SQL being a 1 based array index)
– There are no procedures, only functions
– Functions can be given other functions as parameter
– there is one convention: Functions starting with Uppercase are meant to be used with the new operator
While developing JavaScript your best friend are the web browser developer tools which allow to do a lot locally and test it before moving to the server and share with other developers and users.
There are a lot of resources on the internet to support the copy/paste way of work of JavaScript developers, so there are big chance that someone already did what you need. Just take care about licensing.
In APEX JavaScript can be encapsulated in Dynamic Actions, but try to keep that code as short as possible.
Oracle is also providing some very useful free Open Source Java development Toolkit: JET (Javascript Extension Toolkit)
It’s already integrated in APEX thru the charts.

How to hack your APEX App… (only for testing)

APEX generating web application it’s exposed to the same dangers than any other web application like SQL injection, XSS (cross site scripting, aso).
There is no excuse to ignore security issues because application is only used on the intranet or you think no one will ever find the issue…
… Security is the part of the job as a developer. Danger can come from the outside but also the inside with social engineering based hacking.
It’s very easy to find hacker tools on the internet like Kali Linux, based on Debian, which provides more that 600 tools for penetration testing like for example BEEF (Browser exploitation Framework.
In APEX the golden rule says “Don’t turn of escaping on your pages”.
Don’t forget “Security is hard. If it’s easy you’re not doing it right” so don’r forget it in your project planning.

What you need to know about APEX validations

There are 2 types of validations with web applications:
– Client side
– Server side
APEX is making use of both and even sometimes combines them but server side is the most used.
Where possible Client side validation should be used as well as it’s lighter (less network traffic), but be careful as it can be skirt with developer tools as it’s based on HTML attributes or JavaScript. Tht’s where Server side validation will be you second line of defense and the database triggers and constraints your very last line of defense.
Validation can make use of data patterns (regular expressions).
Interactive Grid validation can also be improved significantly with Java Script and Dynamic actions fired on value changes and/or on page submission.

There is always more to learn and thanks the community a lot of information is available. So keep sharing.
Enjoy APEX!

Cet article APEX Connect 2019 – Day 3 est apparu en premier sur Blog dbi services.

APEX Connect 2019 – Day 2

Wed, 2019-05-08 18:17

The day first started with a 5K fun run.

After the Keynote about “APEX: Where we have come from and where we’re heading: Personal insights on Application Express from the Development Team” by John Snyders, Anthony Rayner and Carsten Czarski explaining their work on APEX and some upcoming features, I decided to attend presentations on following topics:
– Know your Browser Dev Tools!
– APEX Devops with Database tooling
– Klick, Klick, Test – Automated Testing for APEX
– Sponsored Session Pitss: Migration from Oracle Forms to APEX – Approaches compared
– APEX-Actions

Potential upcoming features in APEX 19.2:
– APEX_EXEC enabled Interactive Grids
– Enhanced LOV
– New popup LOV
– Filter reports

Know your Browser Dev Tools!

Every web browser has it’s own set of developer tools but all of them mainly allow following functionalities:
– Manipulate HTML in DOM tree
– Execute JavaScript
– Apply CSS
– Client side debugging and logging
– Analyze network activity
– Simulate screensize
The most popular and complet set of tools is provided by Google Chrome with:
– Elements
– Console
– Sources
– Network
– Performance
– Memory
– Application
– Security
– Audits
Note that if you want to see console output as well as details of Dynamic Actions from APEX, you need to activate debug mode in your application.

APEX Devops with Database tooling

One of the goal of DevOps is to bring Development and Operations to work closer together and make deployment of application smoother.
In order to achieve that 100% automation of follwing tasks is helping a lot:
– Build
– Test
– Release
– Deploy
This is mainly supported by RESTful services within Oracle, ORDS being the corner stone.
Beside that Jenkins has been replaced by GitLab with better web services support.
Database changes are tracked based in Liquibase integrated and enhanced in SQLcl. Vault is also integrated in SQLcl to ease and automate the password management.
Another target of DevOps is zero downtime. This can be supported with tools like consul.io and fabiolb which permit to dynamically add ORDS servers covered by dynamic load balancing.

Klick, Klick, Test – Automated Testing for APEX

There are lots of automated testing tools on the market but they mostly are restricted to specific web browsers.
The target is to have a solution that fits most of them and allows testing of APEX web applications.
It needs a testing framework to abstract the scenario from underlying tool: codecept.io
The code generated by the testing framework being standardized it can be generated based on APEX metadata analysis with the help of a templating tool: handlebars
The process is then supported by an APEX application that can retrieve the applications from the workspace and manage the dedicated test scenarios as well as trigger them on docker containers.

Sponsored Session Pitss: Migration from Oracle Forms to APEX – Approaches compared

Migrating forms applications to APEX can be very cumbersome as they can be very large and complex.
The main phases fo such a migration are:
– Forms application preparation and analysis
– Migration
– APEX Application fine tuning
– Rollout
The success of such a migration lays on the combination of skilled FORMS developers and APEX developers.

APEX-Actions

Beside the well known Dynamic Actions in APEX, there is a new JavaScript library introduced in APEX 5.0: apex.actions
Documentation to it came with version 5.1 in the APEX JavaScript API documentation.
It’s used by the APEX development team in the Page Designer and is now available to all developers.
Actions allow to centrally encapsulate and define rendering, associated function and shortcuts of objects from the web pages all of it dynamically.
It uses an observer which allows to have the same behavior for multiple objects of the same type on the same page.

The day ended with Sponsor Pitches & Lightning Talks:
– APEX Competition Winner Plugin
– 5 things that make your life easier when using Docker
– Verwenden Sie kein PL/SQL!
– Improving Tech with Compassionate Coding

Cet article APEX Connect 2019 – Day 2 est apparu en premier sur Blog dbi services.

APEX Connect 2019 – Day 1

Tue, 2019-05-07 17:33

This year again the APEX connect conference spans over three days with mixed topics around APEX, like JavaScript, PL/SQL and much more.
After the welcome speech and the very funny and interesting Keynote about “PL/SQL: A Career Built On Top Of PL/SQL – 25 years of Coding, Sharing, and Learning” by Martin Widlake, I decided to attend presentations on following topics:
– Oracle Text – A Powerful Tool for Oracle APEX Developers
– Make It Pretty! MIP your APEX application!
– REST Services without APEX – but with APEX_EXEC
– Microservices with APEX
– SQL Magic!
– The UX of forms

PL/SQL: A Career Built On Top Of PL/SQL – 25 years of Coding, Sharing, and Learning:

Martin Widlake shared the story of 25 years development on Oracle from version 6 to the newest 19c.
The most important to retain from his professional journey is that “Good developers are made by other developers” and “Everything you learn will have some return sometime in the future”. That means sharing is the key, keep yourself curious and never stop learning, even things that are not yet obviously useful.

Oracle Text – A Powerful Tool for Oracle APEX Developers

That feature is embedded as a standard in Oracle databases since 1997 when it was named Car Text. In 1999 it became Intermedia Text and finally Oracle Text in 2001. It allows to index text based fields of the database as well as files in BLOBs, allowing much faster and easier search of text patterns (words, sentences, …). We went thru aspects like syntax, fuzzy search, snippets and lexer.
As search combinations require usage of specific operators and delimiters, which are cumbersome for end users, there is a useful package written by Roger Ford that allows to convert simple “Google” like requests into the right format for Oracle Text: PARSER download
His blog will provide nice information about it and Oracle documentation provides all details to the usage of Oracle Text.
You can find further information on following Blog:
Oracle text at a glance

Make It Pretty! MIP your APEX application!

The business logic is the heart of the application but the UI is its face and what users will judge first.
There are some rules which to Make It Pretty (MIP).
First of all it needs to fullfil user needs by either:
– stick to company brand rules
– stick to company webpage design
– stick to user wishes (can be drafted with template monster
Technical and non-technical aspects need to be considered.
Following Design rules help to improve the UI:
– Be consistent
– Make it intuitive
– Be responsive (give feedback to users)
– Keep it simple (not crowded)
– Fonts: max 2 per page, 16px rule (verify on fontpair.co)
– Color rules (verify on contrast checker)
– Have imagination
– Know your APEX universal theme

REST Services without APEX – but with APEX_EXEC

APEX is based on metadata to store definitions and PL/SQL packages support the engine.
That means APEX metadata support can be used outside the APEX application in any PL/SQL code.
One particular APEX PL/SQL package is APEX_EXEC introduced in APEX 18.1. It allows to abstract the data format (XML, json, …) in websources in order to be able to use data as it would come from any local table. It also takes care of pagination from web services to make the data retrieval transparent. But in order to make use of that package an APEX session must first be created to initiate the needed metadata. Fortunately this is made easy since APEX 18.1 with procedure create_session from the apex_session package.
The next version of APEX 19.2 might integrate websource modules with interactive grid.

Microservices with APEX

APEX can be compared to microservices by looking at the characteristics:
– Scalability
– Fail-safe
– Maintainable
– Technology independent
– Database independent
– Small
And it mostly matches!
But APEX also overrules the microservices drawbacks:
– Complex architecture
– Complex testing
– Migration efforts
– Complex development
To have a behavior close to microservices, APEX applications have to make use of web services for the data management and the interfacing with any kind of other services. This allows to clearly separate data management and applications. ORDS allows to enable REST at schema and also object level within APEX. Caching also needs to be considered based data change frequency to lower the lag time of data handling.

SQL Magic!

Since Oracle 12c the database provides the json data guide which allows easy json data manipulation like any standard table data. This comes also with new views like user_json_data_guide.
Oracle 11g introduced Invisible columns that hides columns from table description as well as standard “select *” but not specific select statements. This can be used to deprecate columns or add new columns without breaking existing applications with “select *”. Even though “select *” should be avoided in applications.
Oracle 12c also introduced polymorphic table function that can be used with pipelined tables to create views allowing to pivot and transpose tables whatever number of columns and rows they have.
All those features are very useful and should be used further.

The UX of forms

User eXperience (UX) rules to be applied in forms go beyond APEX. The aim to:
– Reduce cognitive load
– Prevent errors
– Make it user friendly
The rules are following:
– One thing per page
– Put field labels above rather than side to
– Replace small dropdowns by radio buttons
– Use Interactive data lists for long dropdowns
– For login pages, be specific on username type (name, e-mail) and password definition rules
– Avoid * for required fields but rather flag Optional fileds
– Adapt field size on expected data length
– Use smart default values
– Use entry masks
– Use date picker
– Define check before you start pattern to guide users and reduce form length
All that will improve UX.

Cet article APEX Connect 2019 – Day 1 est apparu en premier sur Blog dbi services.

Oracle 19C : Exporting and Importing Broker Configuration

Sat, 2019-05-04 06:57

Up to Oracle 19c, there was no automatic way to backup the configuration of the broker. One solution was to manually copy all executed instructions in a file.
With Oracle 19c, there is now the possibility to export and to import the configuration of the broker. Indeed the new EXPORT CONFIGURATION command will enable to save the metadata contained in the broker configuration file to a text file. This can be very useful if I have to recreate my configuration.
In this blog I have tested this command with following configuration

DGMGRL> show configuration

Configuration - db19c

  Protection Mode: MaxAvailability
  Members:
  DB19C_SITE1 - Primary database
    DB19C_SITE2 - Physical standby database
    DB19C_SITE3 - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 47 seconds ago)

DGMGRL>

We can see the syntax of the EXPORT command with the help command

DGMGRL> help export

Export Data Guard Broker configuration to a file.

Syntax:

  EXPORT CONFIGURATION [TO ];

DGMGRL>

Now let’s export the configuration

DGMGRL> EXPORT CONFIGURATION TO db19c_config.txt
Succeeded.
DGMGRL>

The file is generated in the trace files directory.

[oracle@primaserver trace]$ pwd
/u01/app/oracle/diag/rdbms/db19c_site1/DB19C/trace
[oracle@primaserver trace]$ ls -l db19c_config.txt
-rw-r--r--. 1 oracle oinstall 8469 May  4 12:25 db19c_config.txt
[oracle@primaserver trace]$

Let’s remove the configuration

DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MAXPERFORMANCE;
Succeeded.
DGMGRL> remove configuration;
Removed configuration

DGMGRL> show configuration
ORA-16532: Oracle Data Guard broker configuration does not exist

Configuration details cannot be determined by DGMGRL
DGMGRL>

Now let’s use the IMPORT command to rebuild the configuration

DGMGRL> IMPORT CONFIGURATION FROM db19c_config.txt
Succeeded. Run ENABLE CONFIGURATION to enable the imported configuration.
DGMGRL>

As we can see the configuration is disabled after the import

DGMGRL> show configuration

Configuration - db19c

  Protection Mode: MaxAvailability
  Members:
  DB19C_SITE1 - Primary database
    DB19C_SITE2 - Physical standby database
    DB19C_SITE3 - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
DISABLED

DGMGRL>

So let’s enable it

DGMGRL> ENABLE CONFIGURATION
Enabled.
DGMGRL> show configuration

Configuration - db19c

  Protection Mode: MaxAvailability
  Members:
  DB19C_SITE1 - Primary database
    Warning: ORA-16629: database reports a different protection level from the protection mode

    DB19C_SITE2 - Physical standby database
    DB19C_SITE3 - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
WARNING   (status updated 6 seconds ago)

DGMGRL>

The warning is due to the fact that the protection mode was set to MaxPerformance to be able to drop the configuration.

DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY ;
Succeeded.
DGMGRL> show configuration

Configuration - db19c

  Protection Mode: MaxAvailability
  Members:
  DB19C_SITE1 - Primary database
    DB19C_SITE2 - Physical standby database
    DB19C_SITE3 - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 37 seconds ago)

DGMGRL>

Now let’s run the EXPORT command without specifying a file name

DGMGRL> EXPORT CONFIGURATION
Succeeded.
DGMGRL>

We can see in the trace directory that a default name is generated for the file

DB19C_dmon_5912_brkmeta_1.trc

If we run again the export command

DGMGRL> EXPORT CONFIGURATION
Succeeded.
DGMGRL>

A second default file is created

DB19C_dmon_5912_brkmeta_2.trc

Cet article Oracle 19C : Exporting and Importing Broker Configuration est apparu en premier sur Blog dbi services.

Oracle 19C : Dynamic Change of Fast-Start Failover Target

Fri, 2019-05-03 15:04

Oracle 19C is now available on premise. There are lot of new features. One for the Data Guard Broker is that now we can dynamically change the fast-start failover target to a specified member in the list without disabling the fast-start failover.
I have tested this new feature and is describing this in this blog
I am using 3 servers with Oracle Linux
The Data Guard is already built and the broker is already configured
To enable the fast-start failover there are some requirements. Note that flashback database must be enabled for both databases.
First we put the the transport to SYNC for 3 databases

DGMGRL> edit database 'DB19C_SITE1' set property LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> edit database 'DB19C_SITE2' set property LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> edit database 'DB19C_SITE3' set property LogXptMode='SYNC';
Property "logxptmode" updated

After we change the protection to Maxavailability

DGMGRL>  EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Succeeded.

Then we set the fast-start failover target for both databases

DGMGRL> enable fast_start failover;
Enabled in Zero Data Loss Mode.

Below the status of the configuration. And we can see that DB19C_SITE2 is the target for the fast-start failover

DGMGRL> show configuration

Configuration - db19c

  Protection Mode: MaxAvailability
  Members:
  DB19C_SITE1 - Primary database
    DB19C_SITE2 - (*) Physical standby database
    DB19C_SITE3 - Physical standby database

Fast-Start Failover: Enabled in Zero Data Loss Mode

Configuration Status:
SUCCESS   (status updated 55 seconds ago)

DGMGRL>

The status of the observer will also show as the active target

DGMGRL> show observer

Configuration - db19c

  Primary:            DB19C_SITE1
  Active Target:      DB19C_SITE2

Observer "standserver2" - Master

  Host Name:                    standserver2
  Last Ping to Primary:         2 seconds ago
  Last Ping to Target:          4 seconds ago

DGMGRL>

For example let’s say we want to switchover to DB19C_SITE3

DGMGRL> switchover to 'DB19C_SITE3';
Performing switchover NOW, please wait...
Error: ORA-16655: specified standby database not the current fast-start failover target standby

Failed.
Unable to switchover, primary database is still "DB19C_SITE1"
DGMGRL>

As we can see we cannot because the first fast-start failover target is DB19C_SITE2. We have to change it to DB19C_SITE3
To dynamiccaly do this change , we use the command SET FAST_START FAILOVER TARGET.

DGMGRL> SET FAST_START FAILOVER TARGET TO 'DB19C_SITE3';
Waiting for Fast-Start Failover target to change to "DB19C_SITE3"...
Succeeded.
DGMGRL>

We can query the broker to verify the change

DGMGRL> show configuration

Configuration - db19c

  Protection Mode: MaxAvailability
  Members:
  DB19C_SITE1 - Primary database
    DB19C_SITE3 - (*) Physical standby database
    DB19C_SITE2 - Physical standby database

Fast-Start Failover: Enabled in Zero Data Loss Mode

Configuration Status:
SUCCESS   (status updated 22 seconds ago)

DGMGRL>

And then now I can switchover to DB19C_SITE3

DGMGRL> switchover to 'DB19C_SITE3';
Performing switchover NOW, please wait...
New primary database "DB19C_SITE3" is opening...
Operation requires start up of instance "DB19C" on database "DB19C_SITE1"
Starting instance "DB19C"...
Connected to an idle instance.
ORACLE instance started.
Connected to "DB19C_SITE1"
Database mounted.
Database opened.
Connected to "DB19C_SITE1"
Switchover succeeded, new primary is "DB19C_SITE3"
DGMGRL>

And the new status of the configuration

DGMGRL> show configuration

Configuration - db19c

  Protection Mode: MaxAvailability
  Members:
  DB19C_SITE3 - Primary database
    DB19C_SITE1 - (*) Physical standby database
    DB19C_SITE2 - Physical standby database

Fast-Start Failover: Enabled in Zero Data Loss Mode

Configuration Status:
SUCCESS   (status updated 51 seconds ago)

DGMGRL>

Cet article Oracle 19C : Dynamic Change of Fast-Start Failover Target est apparu en premier sur Blog dbi services.

Unable to add physical standby database

Tue, 2019-04-30 10:00

Recently I tried to setup some new physical standby databases and got the following strange message:


DGMGRL> ADD DATABASE "XXXX" as connect identifier is "XXXX" maintained as physical;
Error: ORA-16698: LOG_ARCHIVE_DEST_n parameter set for object to be added

Often you can read, that you have to check parameter log_archive_dest_2 on primary side. But this error message can also occur if log_archive_dest_2 on standby site points to another standby database.

Solution is following command on standby database to be added:


alter system set log_archive_dest_2 = '';

Cet article Unable to add physical standby database est apparu en premier sur Blog dbi services.

Oracle 18c Grid Infrastructure on Windows Server

Sat, 2019-04-27 10:05

Oracle Grid Infrastucture can be installed on Windows platform. The steps are the same that on other platforms. In this blog we are going to install Oracle GI 18c on Windows 2016.I have two disks on my server
Disk 0 : for the system
Disk 1 : for the ASM
I am using a VirtualBox virtual machine.
We suppose that the grid infrastructure sofware is already downloaded and decompressed in the grid home.
Like on other platforms, we have to configure the ASM disk. In the documentation we can read :
The only partitions that OUI displays for Windows systems are logical drives that are on disks and have been marked (or stamped) with asmtoolg or by Oracle Automatic Storage Management (Oracle ASM) Filter Driver.
So Disk1 should not be formatted and should not be assigned to a letter.
Then the first step is to create logical partition using Windows diskpart utility.

Microsoft Windows [Version 10.0.14393]
(c) 2016 Microsoft Corporation. All rights reserved.

C:\Users\Administrator>diskpart

Microsoft DiskPart version 10.0.14393.0

Copyright (C) 1999-2013 Microsoft Corporation.
On computer: RACWIN2

DISKPART> list disk

  Disk ###  Status         Size     Free     Dyn  Gpt
  --------  -------------  -------  -------  ---  ---
  Disk 0    Online           60 GB      0 B
  Disk 1    Online           20 GB    20 GB

DISKPART> select disk 1

Disk 1 is now the selected disk.

DISKPART> create partition extended

DiskPart succeeded in creating the specified partition.

DISKPART> create partition logical

DiskPart succeeded in creating the specified partition.

DISKPART>

We can then list existing partition for Disk 1

DISKPART> list partition

  Partition ###  Type              Size     Offset
  -------------  ----------------  -------  -------
  Partition 0    Extended            19 GB  1024 KB
* Partition 1    Logical             19 GB  2048 KB

DISKPART>

Once the logical partition created we can launch the asmtool or asmtoolg utility. This utility comes with the grid software

c:\app\grid\18000\bin>asmtoolg.exe

The first time we executed the asmtoolg.exe command, we get following error

According to this Oracle support note Windows: asmtoolg: MSVCR120.dll is missing from your computer (Doc ID 2251869.1), we have to download and install Visual C++ 2013 Redistributable Package.
Once done we launch again the asmtoolg utility

Clicking on next, we can choose the disk we want to stamp for ASM

Click on Next

Click on Next

And Click to Finish. We can then list the disks marked for ASM with the asmtool utility.

C:\Users\Administrator>cd c:\app\18000\grid\bin

c:\app\18000\grid\bin>asmtool.exe -list
NTFS                             \Device\Harddisk0\Partition1              500M
NTFS                             \Device\Harddisk0\Partition2            60938M
ORCLDISKDATA0                    \Device\Harddisk1\Partition1            20477M
c:\app\18000\grid\bin>

Now it’s time to launch the gridSetup executable

c:\app\grid\18000>gridSetup.bat






We decide to ignore the Warning


At the end, we got an error from the cluster verification utility. But it is normal because we ignored some perquisites.


We can verify that the insallation was fine

c:\app\18000\grid>crsctl status resource -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       racwin2                  STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       racwin2                  STABLE
ora.asm
               ONLINE  ONLINE       racwin2                  Started,STABLE
ora.ons
               OFFLINE OFFLINE      racwin2                  STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        ONLINE  ONLINE       racwin2                  STABLE
ora.evmd
      1        ONLINE  ONLINE       racwin2                  STABLE
--------------------------------------------------------------------------------

c:\app\18000\grid>

We can connect to the ASM instance

C:\Users\Administrator>set oracle_sid=+ASM

C:\Users\Administrator>sqlplus / as sysasm

SQL*Plus: Release 18.0.0.0.0 - Production on Sat Apr 27 05:49:38 2019
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle.  All rights reserved.


SQL> select name,state from v$asm_diskgroup;

NAME                           STATE
------------------------------ -----------
DATA                           MOUNTED

SQL>

Conclusion
Once the grid infrastructure configured, we have to install the Oracle database software.

Cet article Oracle 18c Grid Infrastructure on Windows Server est apparu en premier sur Blog dbi services.

Creating PostgreSQL users with a PL/pgSQL function

Thu, 2019-04-25 04:07

Sometimes you might want to create users in PostgreSQL using a function. One use case for this is, that you want to give other users the possibility to create users without granting them the right to do so. How is that possible then? Very much the same as in Oracle you can create functions in PostgreSQL that either execute under the permission of the user who created the function or they run under the permissions of the user who executes the function. Lets see how that works.

Here is a little PL/pgSQL function that creates a user with a given password, does some checks on the input parameters and tests if the user already exists:

create or replace function f_create_user ( pv_username name
                                         , pv_password text
                                         ) returns boolean
as $$
declare
  lb_return boolean := true;
  ln_count integer;
begin
  if ( pv_username is null )
  then
     raise warning 'Username must not be null';
     lb_return := false;
  end if;
  if ( pv_password is null )
  then
     raise warning 'Password must not be null';
     lb_return := false;
  end if;
  -- test if the user already exists
  begin
      select count(*)
        into ln_count
        from pg_user
       where usename = pv_username;
  exception
      when no_data_found then
          -- ok, no user with this name is defined
          null;
      when too_many_rows then
          -- this should really never happen
          raise exception 'You have a huge issue in your catalog';
  end;
  if ( ln_count > 0 )
  then
     raise warning 'The user "%" already exist', pv_username;
     lb_return := false;
  else
      execute 'create user '||pv_username||' with password '||''''||'pv_password'||'''';
  end if;
  return lb_return;
end;
$$ language plpgsql;

Once that function is created:

postgres=# \df
                                   List of functions
 Schema |     Name      | Result data type |        Argument data types         | Type 
--------+---------------+------------------+------------------------------------+------
 public | f_create_user | boolean          | pv_username name, pv_password text | func
(1 row)

… users can be created by calling this function when connected as a user with permissions to do so:

postgres=# select current_user;
 current_user 
--------------
 postgres
(1 row)

postgres=# select f_create_user('test','test');
 f_create_user 
---------------
 t
(1 row)

postgres=# \du
                                   List of roles
 Role name |                         Attributes                         | Member of 
-----------+------------------------------------------------------------+-----------
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 test      |                                                            | {}

Trying to execute this function with a user that does not have permissions to create other users will fail:

postgres=# create user a with password 'a';
CREATE ROLE
postgres=# grant EXECUTE on function f_create_user(name,text) to a;
GRANT
postgres=# \c postgres a
You are now connected to database "postgres" as user "a".
postgres=> select f_create_user('test2','test2');
ERROR:  permission denied to create role
CONTEXT:  SQL statement "create user test2 with password 'pv_password'"
PL/pgSQL function f_create_user(name,text) line 35 at EXECUTE

You can make that work by saying that the function should run with the permissions of the user who created the function:

create or replace function f_create_user ( pv_username name
                                         , pv_password text
                                         ) returns boolean
as $$
declare
  lb_return boolean := true;
  ln_count integer;
begin
...
end;
$$ language plpgsql security definer;

From now on our user “a” is allowed to create other users:

postgres=> select current_user;
 current_user 
--------------
 a
(1 row)

postgres=> select f_create_user('test2','test2');
 f_create_user 
---------------
 t
(1 row)

postgres=> \du
                                   List of roles
 Role name |                         Attributes                         | Member of 
-----------+------------------------------------------------------------+-----------
 a         |                                                            | {}
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 test      |                                                            | {}
 test2     |                                                            | {}

Before implementing something like this consider the “Writing SECURITY DEFINER Functions Safely” section in the documentation, there are some points to consider such as this:

postgres=# revoke all on function f_create_user(name,text) from public;
REVOKE

… and correctly setting the search_path.

Cet article Creating PostgreSQL users with a PL/pgSQL function est apparu en premier sur Blog dbi services.

Direct NFS, ODM 4.0 in 12.2: archiver stuck situation after a shutdown abort and restart

Wed, 2019-04-24 12:23

A customer had an interesting case recently. Since Oracle 12.2. he got archiver stuck situations after a shutdown abort and restart. I reproduced the issue and it is caused by direct NFS since running ODM 4.0 (i.e. since 12.2.). The issue also reproduced on 18.5. When direct NFS is enabled then the archiver-process writes to a file with a preceding dot in its name. E.g.


.arch_1_90_985274359.arc

When the file has been fully copied from the online redolog, then it is renamed to not contain the preceding dot anymore. I.e. using the previous example:


arch_1_90_985274359.arc

When I do a “shutdown abort” while the archiver is in process of writing to the archive-file (with the leading dot in its name) and I do restart the database then Oracle is not able to cope with that file. I.e. in the alert-log I do get the following errors:


2019-04-17T10:22:33.190330+02:00
ARC0 (PID:12598): Unable to create archive log file '/arch_backup/gen183/archivelog/arch_1_90_985274359.arc'
2019-04-17T10:22:33.253476+02:00
Errors in file /u01/app/oracle/diag/rdbms/gen183/gen183/trace/gen183_arc0_12598.trc:
ORA-19504: failed to create file "/arch_backup/gen183/archivelog/arch_1_90_985274359.arc"
ORA-17502: ksfdcre:8 Failed to create file /arch_backup/gen183/archivelog/arch_1_90_985274359.arc
ORA-17500: ODM err:File exists
2019-04-17T10:22:33.254078+02:00
ARC0 (PID:12598): Error 19504 Creating archive log file to '/arch_backup/gen183/archivelog/arch_1_90_985274359.arc'
ARC0 (PID:12598): Stuck archiver: inactive mandatory LAD:1
ARC0 (PID:12598): Stuck archiver condition declared

The DB continues to operate normal until it has to overwrite the online redologfile, which has not been fully archived yet. At that point the archiver becomes stuck and modifications on the DB are no longer possible.

When I remove the incomplete archive-file then the DB continues to operate normally:


rm .arch_1_90_985274359.arc

Using a 12.1-Database with ODM 3.0 I didn’t see that behavior. I.e. I could also see an archived redologfile with a preceding dot in its name, but when I shutdown abort and restart then Oracle removed the file itself and there was no archiver problem.

Testcase:

1.) make sure you have direct NFS enabled


cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk dnfs_on

2.) configure a mandatory log archive destination pointing to a NFS-mounted filesystem. E.g.


[root]# mount -t nfs -o rw,bg,hard,rsize=32768,wsize=32768,vers=3,nointr,timeo=600,proto=tcp,suid,nolock,noac nfs_server:/arch_backup /arch_backup
 
SQL> alter system set log_archive_dest_1='location=/arch_backup/gen183/archivelog mandatory reopen=30';

3.) Produce some DML-load on the DB

I created 2 tables t3 and t4 as a copy of all_objects with approx 600’000 rows:


SQL> create table t3 as select * from all_objects;
SQL> insert into t3 select * from t3;
SQL> -- repeat above insert until you have 600K rows in t3
SQL> commit;
SQL> create table t4 as select * from t3;

Run the following PLSQL-block to produce redo:


begin
for i in 1..20 loop
delete from t3;
commit;
insert into t3 select * from t4;
commit;
end loop;
end;
/

4.) While the PLSQL-block of 3.) is running check the archive-files produced in your log archive destination


ls -ltra /arch_backup/gen183/archivelog

Once you see a file created with a preceding dot in its name then shutdown abort the database:


oracle@18cR0:/arch_backup/gen183/archivelog/ [gen183] ls -ltra /arch_backup/gen183/archivelog
total 2308988
drwxr-xr-x. 3 oracle oinstall 23 Apr 17 10:13 ..
-r--r-----. 1 oracle oinstall 2136861184 Apr 24 18:24 arch_1_104_985274359.arc
drwxr-xr-x. 2 oracle oinstall 69 Apr 24 18:59 .
-rw-r-----. 1 oracle oinstall 2090587648 Apr 24 18:59 .arch_1_105_985274359.arc
 
SQL> shutdown abort

5.) If the file with the preceding dot is still there after the shutdown then you reproduced the issue. Just startup the DB and “tail -f” your alert-log-file.


oracle@18cR0:/arch_backup/gen183/archivelog/ [gen183] cdal
oracle@18cR0:/u01/app/oracle/diag/rdbms/gen183/gen183/trace/ [gen183] tail -f alert_gen183.log
...
2019-04-24T19:01:24.775991+02:00
Oracle instance running with ODM: Oracle Direct NFS ODM Library Version 4.0
...
2019-04-24T19:01:43.770196+02:00
ARC0 (PID:8876): Unable to create archive log file '/arch_backup/gen183/archivelog/arch_1_105_985274359.arc'
2019-04-24T19:01:43.790546+02:00
Errors in file /u01/app/oracle/diag/rdbms/gen183/gen183/trace/gen183_arc0_8876.trc:
ORA-19504: failed to create file "/arch_backup/gen183/archivelog/arch_1_105_985274359.arc"
ORA-17502: ksfdcre:8 Failed to create file /arch_backup/gen183/archivelog/arch_1_105_985274359.arc
ORA-17500: ODM err:File exists
ARC0 (PID:8876): Error 19504 Creating archive log file to '/arch_backup/gen183/archivelog/arch_1_105_985274359.arc'
ARC0 (PID:8876): Stuck archiver: inactive mandatory LAD:1
ARC0 (PID:8876): Stuck archiver condition declared
...

This is a serious problem, because it may cause an archiver stuck problem after a crash. I opened a Service Request at Oracle. The SR has been assigned to the ODM-team now. Once I get a resolution I’ll update this Blog.

Cet article Direct NFS, ODM 4.0 in 12.2: archiver stuck situation after a shutdown abort and restart est apparu en premier sur Blog dbi services.

Bringing up an OpenShift playground in AWS

Wed, 2019-04-17 13:24

Before we begin: This is in no way production ready, as the title states. In a production setup you would put the internal registry on a persistent storage, you would probably have more than one master node and you would probably have more than on compute node. Security is not covered at all here. This post is intended to quickly bring up something you can play with, that’s it. In future posts will explore more details of OpenShift. So, lets start.

What I used as a starting point are three t2.xlarge instances:

One of them will be the master, there will be one infrastructure and one compute node. All of them are based on the Red Hat Enterprise Linux 7.5 (HVM) AMI:

Once these three instances are running the most important thing is that you set persistent hostnames (if you do not do this the OpenShift installation will fail):

[root@master ec2-user]$ hostnamectl set-hostname --static master.it.dbi-services.com
[root@master ec2-user]$ echo "preserve_hostname: true" >> /etc/cloud/cloud.cfg

Of course you need to do that on all three hosts. Once that is done, because I have no DNS in my setup, /etc/hosts should be adjusted on all the machines, in my case:

[root@master ec2-user]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.1.167  master master.it.dbi-services.com
10.0.1.110  node1 node1.it.dbi-services.com
10.0.1.13   node2 node2.it.dbi-services.com

As everything is based on RedHat you need to register all the machines:

[root@master ec2-user]$ subscription-manager register
Registering to: subscription.rhsm.redhat.com:443/subscription
Username: xxxxxx
Password: 
The system has been registered with ID: xxxxxxx
The registered system name is: master

Once done, refresh and then list the available subscriptions. There should be at least one which is named like “Red Hat OpenShift”. Having identified the “Pool ID” for that one attach it (on all machines):

[root@master ec2-user]$ subscription-manager refresh
[root@master ec2-user]$ subscription-manager list --available
[root@master ec2-user]$ subscription-manager attach --pool=xxxxxxxxxxxxxxxxxxxxxxxxx

Now you are ready to enable the required repositories (on all machines):

[root@master ec2-user]$ subscription-manager repos --enable="rhel-7-server-rpms" \
    --enable="rhel-7-server-extras-rpms" \
     --enable="rhel-7-server-ose-3.11-rpms" \
     --enable="rhel-7-server-ansible-2.6-rpms"

Repository 'rhel-7-server-rpms' is enabled for this system.
Repository 'rhel-7-server-extras-rpms' is enabled for this system.
Repository 'rhel-7-server-ansible-2.6-rpms' is enabled for this system.
Repository 'rhel-7-server-ose-3.11-rpms' is enabled for this system.

Having the repos enabled the required packages can be installed (on all machines):

[root@master ec2-user]$ yum -y install wget git net-tools bind-utils iptables-services bridge-utils bash-completion kexec-tools sos psacct

Updating all packages to the latest release and rebooting to the potentially new kernel is recommended. As we will be using Docker for this deployment we will install that as well (on all machines):

[root@master ec2-user]$ yum install -y docker
[root@master ec2-user]$ yum update -y
[root@master ec2-user]$ systemctl reboot

Now, that we are up to date and the prerequisites are met we create a new group and a new user. Why that? The complete OpenShift installation is driven by Ansible. You could run all of the installation directly as root, but a better way is to use a dedicated user that has sudo permissions to perform the tasks (on all machines):

[root@master ec2-user]$ useradd -g dbi dbi
[root@master ec2-user]$ useradd -g dbi dbi

As Ansible needs to login to all the machines you will need to setup password-less ssh connections for the user. I am assuming that you know how to do that. If not, please check here.

Several tasks of the OpenShift Ansible playbooks need to be executed as root so the “dbi” user needs permissions to do that (on all machines):

[root@master ec2-user]$ cat /etc/sudoers | grep dbi
dbi	ALL=(ALL)	NOPASSWD: ALL

There is one last preparation step to be executed on the master only: Installing the Ansible playbooks required to bring up OpenShift:

[root@master ec2-user]$ yum -y install openshift-ansible

That’s all the preparation required for this playground setup. As all the installation is Ansible based we need an inventory file on the master:

[dbi@master ~]$ id -a
uid=1001(dbi) gid=1001(dbi) groups=1001(dbi),994(dockerroot) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[dbi@master ~]$ pwd
/home/dbi
[dbi@master ~]$ cat inventory 
# Create an OSEv3 group that contains the masters, nodes, and etcd groups
[OSEv3:children]
masters
nodes
etcd

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
# SSH user, this user should allow ssh based auth without requiring a password
ansible_ssh_user=dbi
# If ansible_ssh_user is not root, ansible_become must be set to true
ansible_become=true
become_method = sudo
openshift_deployment_type=openshift-enterprise
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]
openshift_master_htpasswd_users={'admin': '$apr1$4ZbKL26l$3eKL/6AQM8O94lRwTAu611', 'developer': '$apr1$4ZbKL26l$3eKL/6AQM8O94lRwTAu611'}
# Registry settings
oreg_url=registry.redhat.io/openshift3/ose-${component}:${version}
oreg_auth_user=dbiservices2800
oreg_auth_password=eIJAy7LsyA
# disable checks
openshift_disable_check=disk_availability,docker_storage,memory_availability

openshift_master_default_subdomain=apps.it.dbi-services.com

# host group for masters
[masters]
master.it.dbi-services.com

# host group for etcd
[etcd]
master.it.dbi-services.com

# host group for nodes, includes region info
[nodes]
master.it.dbi-services.com openshift_node_group_name='node-config-master'
node1.it.dbi-services.com openshift_node_group_name='node-config-compute'
node2.it.dbi-services.com openshift_node_group_name='node-config-infra'

If you need more details about all the variables and host groups used here, please check the OpenShift documentation.

In any case pleas execute the prerequisites playbook before starting with the installation. When that does not run until the end or does show any “failed” tasks then you need to fix something before proceeding:

[dbi@master ~]$ ansible-playbook -i inventory /usr/share/ansible/openshift-ansible/playbooks/prerequisites.yml 

PLAY [Fail openshift_kubelet_name_override for new hosts] **********************************************

TASK [Gathering Facts] *********************************************************************************
ok: [master.it.dbi-services.com]
ok: [node1.it.dbi-services.com]

...

PLAY RECAP *********************************************************************************************
localhost                  : ok=11   changed=0    unreachable=0    failed=0   
master.it.dbi-services.com : ok=80   changed=17   unreachable=0    failed=0   
node1.it.dbi-services.com  : ok=56   changed=16   unreachable=0    failed=0   


INSTALLER STATUS ***************************************************************************************
Initialization  : Complete (0:01:40)

When it is fine, install OpenShift:

[dbi@master ~]$ ansible-playbook -i inventory /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml 

That will take some time but at the end your OpenShift cluster should be up and running:

[dbi@master ~]$ oc login -u system:admin
Logged into "https://master:8443" as "system:admin" using existing credentials.

You have access to the following projects and can switch between them with 'oc project ':

  * default
    kube-public
    kube-service-catalog
    kube-system
    management-infra
    openshift
    openshift-ansible-service-broker
    openshift-console
    openshift-infra
    openshift-logging
    openshift-monitoring
    openshift-node
    openshift-sdn
    openshift-template-service-broker
    openshift-web-console

Using project "default".

[dbi@master ~]$ oc get nodes 
NAME                         STATUS    ROLES     AGE       VERSION
master.it.dbi-services.com   Ready     master    1h        v1.11.0+d4cacc0
node1.it.dbi-services.com    Ready     compute   1h        v1.11.0+d4cacc0
node2.it.dbi-services.com    Ready     infra     1h        v1.11.0+d4cacc0

As expected there is one master, one infratructure and one compute node. All the pods in the default namespace should be running fine:

[dbi@master ~]$ oc get pods -n default
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-lmjzs    1/1       Running   0          1h
registry-console-1-n4z5j   1/1       Running   0          1h
router-1-5wl27             1/1       Running   0          1h

All the default Image Streams are there as well:

[dbi@master ~]$ oc get is -n openshift
NAME                                           DOCKER REPO                                                                               TAGS                          UPDATED
apicurito-ui                                   docker-registry.default.svc:5000/openshift/apicurito-ui                                   1.2                           2 hours ago
dotnet                                         docker-registry.default.svc:5000/openshift/dotnet                                         latest,1.0,1.1 + 3 more...    2 hours ago
dotnet-runtime                                 docker-registry.default.svc:5000/openshift/dotnet-runtime                                 2.2,latest,2.0 + 1 more...    2 hours ago
eap-cd-openshift                               docker-registry.default.svc:5000/openshift/eap-cd-openshift                               14.0,15.0,13 + 6 more...      2 hours ago
fis-java-openshift                             docker-registry.default.svc:5000/openshift/fis-java-openshift                             1.0,2.0                       2 hours ago
fis-karaf-openshift                            docker-registry.default.svc:5000/openshift/fis-karaf-openshift                            1.0,2.0                       2 hours ago
fuse-apicurito-generator                       docker-registry.default.svc:5000/openshift/fuse-apicurito-generator                       1.2                           2 hours ago
fuse7-console                                  docker-registry.default.svc:5000/openshift/fuse7-console                                  1.0,1.1,1.2                   2 hours ago
fuse7-eap-openshift                            docker-registry.default.svc:5000/openshift/fuse7-eap-openshift                            1.0,1.1,1.2                   2 hours ago
fuse7-java-openshift                           docker-registry.default.svc:5000/openshift/fuse7-java-openshift                           1.0,1.1,1.2                   2 hours ago
fuse7-karaf-openshift                          docker-registry.default.svc:5000/openshift/fuse7-karaf-openshift                          1.0,1.1,1.2                   2 hours ago
httpd                                          docker-registry.default.svc:5000/openshift/httpd                                          2.4,latest                    2 hours ago
java                                           docker-registry.default.svc:5000/openshift/java                                           8,latest                      2 hours ago
jboss-amq-62                                   docker-registry.default.svc:5000/openshift/jboss-amq-62                                   1.3,1.4,1.5 + 4 more...       2 hours ago
jboss-amq-63                                   docker-registry.default.svc:5000/openshift/jboss-amq-63                                   1.0,1.1,1.2 + 1 more...       2 hours ago
jboss-datagrid73-openshift                     docker-registry.default.svc:5000/openshift/jboss-datagrid73-openshift                     1.0                           
jboss-datavirt63-driver-openshift              docker-registry.default.svc:5000/openshift/jboss-datavirt63-driver-openshift              1.0,1.1                       2 hours ago
jboss-datavirt63-openshift                     docker-registry.default.svc:5000/openshift/jboss-datavirt63-openshift                     1.0,1.1,1.2 + 2 more...       2 hours ago
jboss-decisionserver62-openshift               docker-registry.default.svc:5000/openshift/jboss-decisionserver62-openshift               1.2                           2 hours ago
jboss-decisionserver63-openshift               docker-registry.default.svc:5000/openshift/jboss-decisionserver63-openshift               1.3,1.4                       2 hours ago
jboss-decisionserver64-openshift               docker-registry.default.svc:5000/openshift/jboss-decisionserver64-openshift               1.0,1.1,1.2 + 1 more...       2 hours ago
jboss-eap64-openshift                          docker-registry.default.svc:5000/openshift/jboss-eap64-openshift                          1.7,1.3,1.4 + 6 more...       2 hours ago
jboss-eap70-openshift                          docker-registry.default.svc:5000/openshift/jboss-eap70-openshift                          1.5,1.6,1.7 + 2 more...       2 hours ago
jboss-eap71-openshift                          docker-registry.default.svc:5000/openshift/jboss-eap71-openshift                          1.1,1.2,1.3 + 1 more...       2 hours ago
jboss-eap72-openshift                          docker-registry.default.svc:5000/openshift/jboss-eap72-openshift                          1.0,latest                    2 hours ago
jboss-fuse70-console                           docker-registry.default.svc:5000/openshift/jboss-fuse70-console                           1.0                           2 hours ago
jboss-fuse70-eap-openshift                     docker-registry.default.svc:5000/openshift/jboss-fuse70-eap-openshift                     1.0                           
jboss-fuse70-java-openshift                    docker-registry.default.svc:5000/openshift/jboss-fuse70-java-openshift                    1.0                           2 hours ago
jboss-fuse70-karaf-openshift                   docker-registry.default.svc:5000/openshift/jboss-fuse70-karaf-openshift                   1.0                           2 hours ago
jboss-processserver63-openshift                docker-registry.default.svc:5000/openshift/jboss-processserver63-openshift                1.3,1.4                       2 hours ago
jboss-processserver64-openshift                docker-registry.default.svc:5000/openshift/jboss-processserver64-openshift                1.2,1.3,1.0 + 1 more...       2 hours ago
jboss-webserver30-tomcat7-openshift            docker-registry.default.svc:5000/openshift/jboss-webserver30-tomcat7-openshift            1.1,1.2,1.3                   2 hours ago
jboss-webserver30-tomcat8-openshift            docker-registry.default.svc:5000/openshift/jboss-webserver30-tomcat8-openshift            1.2,1.3,1.1                   2 hours ago
jboss-webserver31-tomcat7-openshift            docker-registry.default.svc:5000/openshift/jboss-webserver31-tomcat7-openshift            1.0,1.1,1.2                   2 hours ago
jboss-webserver31-tomcat8-openshift            docker-registry.default.svc:5000/openshift/jboss-webserver31-tomcat8-openshift            1.0,1.1,1.2                   2 hours ago
jenkins                                        docker-registry.default.svc:5000/openshift/jenkins                                        2,latest,1                    2 hours ago
mariadb                                        docker-registry.default.svc:5000/openshift/mariadb                                        10.1,10.2,latest              2 hours ago
mongodb                                        docker-registry.default.svc:5000/openshift/mongodb                                        2.4,3.2,3.6 + 3 more...       2 hours ago
mysql                                          docker-registry.default.svc:5000/openshift/mysql                                          5.7,latest,5.6 + 1 more...    2 hours ago
nginx                                          docker-registry.default.svc:5000/openshift/nginx                                          1.8,latest,1.10 + 1 more...   2 hours ago
nodejs                                         docker-registry.default.svc:5000/openshift/nodejs                                         8-RHOAR,0.10,6 + 3 more...    2 hours ago
perl                                           docker-registry.default.svc:5000/openshift/perl                                           5.20,5.24,5.16 + 1 more...    2 hours ago
php                                            docker-registry.default.svc:5000/openshift/php                                            5.6,5.5,7.0 + 1 more...       2 hours ago
postgresql                                     docker-registry.default.svc:5000/openshift/postgresql                                     latest,10,9.2 + 3 more...     2 hours ago
python                                         docker-registry.default.svc:5000/openshift/python                                         2.7,3.3,3.4 + 3 more...       2 hours ago
redhat-openjdk18-openshift                     docker-registry.default.svc:5000/openshift/redhat-openjdk18-openshift                     1.0,1.1,1.2 + 2 more...       2 hours ago
redhat-sso70-openshift                         docker-registry.default.svc:5000/openshift/redhat-sso70-openshift                         1.3,1.4                       2 hours ago
redhat-sso71-openshift                         docker-registry.default.svc:5000/openshift/redhat-sso71-openshift                         1.1,1.2,1.3 + 1 more...       2 hours ago
redhat-sso72-openshift                         docker-registry.default.svc:5000/openshift/redhat-sso72-openshift                         1.0,1.1,1.2                   2 hours ago
redis                                          docker-registry.default.svc:5000/openshift/redis                                          3.2,latest                    2 hours ago
rhdm70-decisioncentral-openshift               docker-registry.default.svc:5000/openshift/rhdm70-decisioncentral-openshift               1.0,1.1                       2 hours ago
rhdm70-kieserver-openshift                     docker-registry.default.svc:5000/openshift/rhdm70-kieserver-openshift                     1.0,1.1                       2 hours ago
rhdm71-controller-openshift                    docker-registry.default.svc:5000/openshift/rhdm71-controller-openshift                    1.0,1.1                       2 hours ago
rhdm71-decisioncentral-indexing-openshift      docker-registry.default.svc:5000/openshift/rhdm71-decisioncentral-indexing-openshift      1.0,1.1                       2 hours ago
rhdm71-decisioncentral-openshift               docker-registry.default.svc:5000/openshift/rhdm71-decisioncentral-openshift               1.1,1.0                       2 hours ago
rhdm71-kieserver-openshift                     docker-registry.default.svc:5000/openshift/rhdm71-kieserver-openshift                     1.0,1.1                       2 hours ago
rhdm71-optaweb-employee-rostering-openshift    docker-registry.default.svc:5000/openshift/rhdm71-optaweb-employee-rostering-openshift    1.0,1.1                       2 hours ago
rhdm72-controller-openshift                    docker-registry.default.svc:5000/openshift/rhdm72-controller-openshift                    1.0,1.1                       2 hours ago
rhdm72-decisioncentral-indexing-openshift      docker-registry.default.svc:5000/openshift/rhdm72-decisioncentral-indexing-openshift      1.0,1.1                       2 hours ago
rhdm72-decisioncentral-openshift               docker-registry.default.svc:5000/openshift/rhdm72-decisioncentral-openshift               1.1,1.0                       2 hours ago
rhdm72-kieserver-openshift                     docker-registry.default.svc:5000/openshift/rhdm72-kieserver-openshift                     1.0,1.1                       2 hours ago
rhdm72-optaweb-employee-rostering-openshift    docker-registry.default.svc:5000/openshift/rhdm72-optaweb-employee-rostering-openshift    1.0,1.1                       2 hours ago
rhpam70-businesscentral-indexing-openshift     docker-registry.default.svc:5000/openshift/rhpam70-businesscentral-indexing-openshift     1.0,1.1,1.2                   2 hours ago
rhpam70-businesscentral-monitoring-openshift   docker-registry.default.svc:5000/openshift/rhpam70-businesscentral-monitoring-openshift   1.1,1.2,1.0                   2 hours ago
rhpam70-businesscentral-openshift              docker-registry.default.svc:5000/openshift/rhpam70-businesscentral-openshift              1.0,1.1,1.2                   2 hours ago
rhpam70-controller-openshift                   docker-registry.default.svc:5000/openshift/rhpam70-controller-openshift                   1.0,1.1,1.2                   2 hours ago
rhpam70-kieserver-openshift                    docker-registry.default.svc:5000/openshift/rhpam70-kieserver-openshift                    1.0,1.1,1.2                   2 hours ago
rhpam70-smartrouter-openshift                  docker-registry.default.svc:5000/openshift/rhpam70-smartrouter-openshift                  1.0,1.1,1.2                   2 hours ago
rhpam71-businesscentral-indexing-openshift     docker-registry.default.svc:5000/openshift/rhpam71-businesscentral-indexing-openshift     1.0,1.1                       2 hours ago
rhpam71-businesscentral-monitoring-openshift   docker-registry.default.svc:5000/openshift/rhpam71-businesscentral-monitoring-openshift   1.0,1.1                       2 hours ago
rhpam71-businesscentral-openshift              docker-registry.default.svc:5000/openshift/rhpam71-businesscentral-openshift              1.0,1.1                       2 hours ago
rhpam71-controller-openshift                   docker-registry.default.svc:5000/openshift/rhpam71-controller-openshift                   1.0,1.1                       2 hours ago
rhpam71-kieserver-openshift                    docker-registry.default.svc:5000/openshift/rhpam71-kieserver-openshift                    1.0,1.1                       2 hours ago
rhpam71-smartrouter-openshift                  docker-registry.default.svc:5000/openshift/rhpam71-smartrouter-openshift                  1.0,1.1                       2 hours ago
rhpam72-businesscentral-indexing-openshift     docker-registry.default.svc:5000/openshift/rhpam72-businesscentral-indexing-openshift     1.1,1.0                       2 hours ago
rhpam72-businesscentral-monitoring-openshift   docker-registry.default.svc:5000/openshift/rhpam72-businesscentral-monitoring-openshift   1.0,1.1                       2 hours ago
rhpam72-businesscentral-openshift              docker-registry.default.svc:5000/openshift/rhpam72-businesscentral-openshift              1.0,1.1                       2 hours ago
rhpam72-controller-openshift                   docker-registry.default.svc:5000/openshift/rhpam72-controller-openshift                   1.0,1.1                       2 hours ago
rhpam72-kieserver-openshift                    docker-registry.default.svc:5000/openshift/rhpam72-kieserver-openshift                    1.0,1.1                       2 hours ago
rhpam72-smartrouter-openshift                  docker-registry.default.svc:5000/openshift/rhpam72-smartrouter-openshift                  1.0,1.1                       2 hours ago
ruby                                           docker-registry.default.svc:5000/openshift/ruby                                           2.2,2.3,2.4 + 3 more...       2 hours ago

Happy playing …

Cet article Bringing up an OpenShift playground in AWS est apparu en premier sur Blog dbi services.

WebLogic – Update on the WLST monitoring

Sun, 2019-04-14 11:05

A few years ago, I wrote this blog about a WLST script to monitor a WebLogic Server. At that time, we were managing a Documentum Platform with 115 servers and now, it’s more than 700 servers so I wanted to come back in this blog with an update on the WLST script.

1. Update of the WLST script needed

Over the past two years, we installed a lot of new servers with a lot of new components. Some of these components required us to adapt slightly our monitoring solution to be able to handle the monitoring in the same, efficient way, for all servers of our Platform: we want to have a single solution which fits all cases. The new cases we came accross where WebLogic Clustering as well as EAR Applications.

In the past, we only had WAR files related to Documentum: D2.war, da.war, D2-REST.war, aso… All these WAR files are quite simple to monitor because one “ApplicationRuntimes” equal one “ComponentRuntimes” (I’m talking here about the WLST script from the previous blog). So basically if you want to check the number of open sessions [get(‘OpenSessionsCurrentCount’)] or the total amount of sessions [get(‘SessionsOpenedTotalCount’)], then it’s just one value. EAR files often contain WAR file(s) as well as other components so in this case, you have potentially a lot of “ComponentRuntimes” for each “ApplicationRuntimes”. Therefore, the best way I found to keep having a single monitoring solution for all WebLogic Servers, no matter what application is deployed on it, was to loop on each components and cumulate the number of open (respectively total sessions) for each components and then return that for the application.

In addition to that, we also started to deploy some WebLogic Servers in Cluster so the monitoring script also needed to take that into account. In the previous version, the WLST script supposed that the deployment was a single local Managed Server (local to the AdminServer) so in case of a WLS Cluster, the deployment target can be a cluster and in this case, the WLST script wouldn’t find the correct monitoring value so I had to introduce a check on whether or not the Application is deployed on a cluster and in this case, then I’m selecting the deployment on the local Managed Server that is part of this cluster. We are using the NodeManager Listen Address to know if the Managed Server is a local one so it expects both the NodeManager and the Managed Server to use the same Listen Address.

As a side note, in case you have a WebLogic Cluster that is deploying an Application only on certain machines of the WebLogic Domain (so for example you have 3 machines but a cluster only targets 2 of them), then on the machine(s) where the Application isn’t deployed by the WebLogic Cluster, the monitoring will still try to find the Application on a local Managed Server and it will not succeed. This will still create a log file for this Application with the following content: “CRITICAL – The Managed Server ‘ + appTargetName + ‘ or the Application ‘ + app.getName() + ‘ is not started”. This is expected since the Application isn’t deployed there but it’s then your job to either set the monitoring tool to expect a CRITICAL or just not check this specific log file for this machine.

Finally the last modification I did was using a properties file instead of embedded properties because we are now deploying more and more WebLogic Servers with our silent scripts (takes a few minutes to have a WLS fully installed, configured, with clustering, with SSL, aso…) and it is easier to have a properties file for a WebLogic Domain that is used by our WebLogic Servers as well as by the Monitoring System to know what’s installed, if it’s a cluster, where is the AdminServer, if it’s using t3 or t3s, aso…

2. WebLogic Domain properties file

As mentioned above, we started to use properties file with our silent scripts to describes what is installed on the local server aso… This is an extract of a domain.properties file that we are using:

[weblogic@weblogic_server_01 ~]$ cat /app/weblogic/wlst/domain.properties
...
NM_HOST=weblogic_server_01.dbi-services.com
ADMIN_URL=t3s://weblogic_server_01.dbi-services.com:8443
DOMAIN_NAME=MyDomain
...
CLUSTERS=clusterWS-01:msWS-011,machine-01,weblogic_server_01.dbi-services.com,8080,8081:msWS-012,machine-02,weblogic_server_02.dbi-services.com,8080,8081|clusterWS-02:msWS-021,machine-01,weblogic_server_01.dbi-services.com,8082,8083:msWS-022,machine-02,weblogic_server_02.dbi-services.com,8082,8083
...
[weblogic@weblogic_server_01 ~]$

The parameter “CLUSTERS” in this properties file is composed in the following way:

  • If it’s a WebLogic Domain with Clustering: CLUSTERS=cluster1:ms11,machine11,listen11,http11,https11:ms12,machine12,…|cluster2:ms21,machine21,…:ms22,machine22,…:ms23,machine23,…
    • ms11 and ms12 being 2 Managed Servers part of the cluster cluster1
    • ms21, ms22 and ms23 being 3 Managed Servers part of the cluster cluster2
  • If it’s not a WebLogic Domain with Clustering: CLUSTERS= (equal nothing, it’s empty, not needed)

There are other properties in this domain.properties of ours like the config and key secure files that WebLogic is using (different from the Nagios ones), the NodeManager configuration (port, type, config & key secure files as well) and a few other things about the AdminServer, the list of Managed Servers, aso… But all these properties aren’t needed for the monitoring topic so I’m only showing the ones that make sense.

3. New version of the WLST script

Enough talk, I assume you came here for the WLST script so here it is. I highlighted below what changed compared to the previous version so you can spot easily how the customization was done:

[nagios@weblogic_server_01 ~]$ cat /app/nagios/etc/objects/scripts/MyDomain_check_weblogic.wls
# WLST
# Identification: check_weblogic.wls  v1.2  15/08/2018
#
# File: check_weblogic.wls
# Purpose: check if a WebLogic Server is running properly
# Author: dbi services (Morgan Patou)
# Version: 1.0 23/03/2016
# Version: 1.1 14/06/2018 - re-formatting
# Version: 1.2 15/08/2018 - including cluster & EAR support
#
###################################################

from java.io import File
from java.io import FileOutputStream

import re

properties='/app/weblogic/wlst/domain.properties'

try:
  loadProperties(properties)
except:
  exit()

directory='/app/nagios/etc/objects/scripts'
userConfig=directory + '/' + DOMAIN_NAME + '_configfile.secure'
userKey=directory + '/' + DOMAIN_NAME + '_keyfile.secure'

try:
  connect(userConfigFile=userConfig, userKeyFile=userKey, url=ADMIN_URL)
except:
  exit()

def setOutputToFile(fileName):
  outputFile=File(fileName)
  fos=FileOutputStream(outputFile)
  theInterpreter.setOut(fos)

def setOutputToNull():
  outputFile=File('/dev/null')
  fos=FileOutputStream(outputFile)
  theInterpreter.setOut(fos)

def getLocalServerName(clustername):
  localServerName=""
  for clusterList in CLUSTERS.split('|'):
    found=0
    for clusterMember in clusterList.split(':'):
      if found == 1:
        clusterMemberDetails=clusterMember.split(',')
        if clusterMemberDetails[2] == NM_HOST:
          localServerName=clusterMemberDetails[0]
      if clusterMember == clustername:
        found=1
  return localServerName

while 1:
  domainRuntime()
  for server in domainRuntimeService.getServerRuntimes():
    setOutputToFile(directory + '/wl_threadpool_' + domainName + '_' + server.getName() + '.out')
    cd('/ServerRuntimes/' + server.getName() + '/ThreadPoolRuntime/ThreadPoolRuntime')
    print 'threadpool_' + domainName + '_' + server.getName() + '_OUT',get('ExecuteThreadTotalCount'),get('HoggingThreadCount'),get('PendingUserRequestCount'),get('CompletedRequestCount'),get('Throughput'),get('HealthState')
    setOutputToNull()
    setOutputToFile(directory + '/wl_heapfree_' + domainName + '_' + server.getName() + '.out')
    cd('/ServerRuntimes/' + server.getName() + '/JVMRuntime/' + server.getName())
    print 'heapfree_' + domainName + '_' + server.getName() + '_OUT',get('HeapFreeCurrent'),get('HeapSizeCurrent'),get('HeapFreePercent')
    setOutputToNull()

  try:
    setOutputToFile(directory + '/wl_sessions_' + domainName + '_console.out')
    cd('/ServerRuntimes/AdminServer/ApplicationRuntimes/consoleapp/ComponentRuntimes/AdminServer_/console')
    print 'sessions_' + domainName + '_console_OUT',get('OpenSessionsCurrentCount'),get('SessionsOpenedTotalCount')
    setOutputToNull()
  except WLSTException,e:
    setOutputToFile(directory + '/wl_sessions_' + domainName + '_console.out')
    print 'CRITICAL - The Server AdminServer or the Administrator Console is not started'
    setOutputToNull()

  domainConfig()
  for app in cmo.getAppDeployments():
    domainConfig()
    cd('/AppDeployments/' + app.getName())
    for appTarget in cmo.getTargets():
      if appTarget.getType() == "Cluster":
        appTargetName=getLocalServerName(appTarget.getName())
      else:
        appTargetName=appTarget.getName()
      print appTargetName
      domainRuntime()
      try:
        setOutputToFile(directory + '/wl_sessions_' + domainName + '_' + app.getName() + '.out')
        cd('/ServerRuntimes/' + appTargetName + '/ApplicationRuntimes/' + app.getName())
        openSessions=0
        totalSessions=0
        for appComponent in cmo.getComponentRuntimes():
          result=re.search(appTargetName,appComponent.getName())
          if result != None:
            cd('ComponentRuntimes/' + appComponent.getName())
            try:
              openSessions+=get('OpenSessionsCurrentCount')
              totalSessions+=get('SessionsOpenedTotalCount')
            except WLSTException,e:
              cd('/ServerRuntimes/' + appTargetName + '/ApplicationRuntimes/' + app.getName())
            cd('/ServerRuntimes/' + appTargetName + '/ApplicationRuntimes/' + app.getName())
        print 'sessions_' + domainName + '_' + app.getName() + '_OUT',openSessions,totalSessions
        setOutputToNull()
      except WLSTException,e:
        setOutputToFile(directory + '/wl_sessions_' + domainName + '_' + app.getName() + '.out')
        print 'CRITICAL - The Managed Server ' + appTargetName + ' or the Application ' + app.getName() + ' is not started'
        setOutputToNull()

  java.lang.Thread.sleep(120000)

[nagios@weblogic_server_01 ~]$

 

For all our WAR files, even if the WLST script changed, the outcome is the same since there is only one component and for the EAR files, it will just add all of the open sessions into a global count. Obviously, this doesn’t necessary represent the real number of “user” sessions but it’s an estimation of the load. We do not really care about a specific number but we want to see how the load evolves during the day and we can adjust our thresholds to take into account that it’s not just a single component’s sessions but it’s a global count.

You can obviously tweak the script to match your needs but this is working pretty well for us on all our environments. If you have ideas about what could be updated to make it even better, don’t hesitate to share!

 

Cet article WebLogic – Update on the WLST monitoring est apparu en premier sur Blog dbi services.

Documentum – RCS/CFS installation failure

Sun, 2019-04-14 11:00

A few weeks ago, I had a task to add a new CS into already HA environments (DEV/TEST/PROD) to better support the load on these environments as well as adding a new repository on all Content Servers. These environments were installed a nearly two years ago already so it was really just adding something new into the picture. When doing so, the installation of a new repository on existing Content Servers (CS1 / CS2) was successful and without much trouble (installation in silent obviously so it’s fast & reliable for the CS and RCS) but then the new Remote Content Server (RCS/CFS – CS3) installation, using the same silent scripts, failed for the two existing/old repositories while it succeeded for the new one.

Well actually, the CFS installation didn’t completely fail. The silent installer returned the prompt properly, the repository start/stop scripts were present, the config folder was present, the dm_server_config object was there, aso… So it looked like the installation was successful but, as a best practice, it is really important to always check the log file for a silent installation because it doesn’t show anything on the prompt, even if there are errors. So while checking at the log file after the silent installer returned the prompt, I saw the following:

[dmadmin@content_server_03 ~]$ cd $DM_HOME/install/logs/
[dmadmin@content_server_03 logs]$ cat install.log
15:12:31,830  INFO [main] com.documentum.install.shared.installanywhere.actions.InitializeSharedLibrary - Done InitializeSharedLibrary ...
15:12:31,870  INFO [main] com.documentum.install.multinode.cfs.installanywhere.actions.DiWAServerCfsInitializeImportantServerVariables - The installer is gathering system configuration information.
15:12:31,883  INFO [main] com.documentum.install.server.installanywhere.actions.DiWASilentRemoteServerValidation - Start to verify the password
15:12:33,259  INFO [main] com.documentum.fc.client.security.impl.JKSKeystoreUtilForDfc - keystore file name is /tmp/655905.tmp/dfc.keystore
15:12:33,635  INFO [main] com.documentum.fc.client.security.internal.CreateIdentityCredential$MultiFormatPKIKeyPair - generated RSA (2,048-bit strength) mutiformat key pair in 352 ms
15:12:33,667  INFO [main] com.documentum.fc.client.security.internal.CreateIdentityCredential - certificate created for DFC <CN=dfc_UnYQdYTP6pV6zRn7tQMIavqlcrAa,O=EMC,OU=Documentum> valid from Fri Feb 01 15:07:33 UTC 2019 to Mon Jan 29 15:12:33 UTC 2029:

15:12:33,668  INFO [main] com.documentum.fc.client.security.impl.JKSKeystoreUtilForDfc - keystore file name is /tmp/655905.tmp/dfc.keystore
15:12:33,681  INFO [main] com.documentum.fc.client.security.impl.InitializeKeystoreForDfc - [DFC_SECURITY_IDENTITY_INITIALIZED] Initialized new identity in keystore, DFC alias=dfc, identity=dfc_UnYQdYTP6pV6zRn7tQMIavqlcrAa
15:12:33,682  INFO [main] com.documentum.fc.client.security.impl.AuthenticationMgrForDfc - identity for authentication is dfc_UnYQdYTP6pV6zRn7tQMIavqlcrAa
15:12:33,687  INFO [main] com.documentum.fc.impl.RuntimeContext - DFC Version is 7.3.0040.0025
15:12:33,939  INFO [Timer-2] com.documentum.fc.client.impl.bof.cache.ClassCacheManager$CacheCleanupTask - [DFC_BOF_RUNNING_CLEANUP] Running class cache cleanup task
15:12:34,717  INFO [main] com.documentum.fc.client.impl.connection.docbase.DocbaseConnection - Object protocol version 2
15:12:34,758  INFO [main] com.documentum.fc.client.security.internal.AuthenticationMgr - new identity bundle <dfc_UnYQdYTP6pV6zRn7tQMIavqlcrAa   1549033954      content_server_03.dbi-services.com         hicAAvU7QX3VNvDft2PwmnW4SIFX+5Snx7PlA5hryuOpo2eWLcEANYAEwYBbU6F3hEBAMenRR/lXFrHFqlrxTZl54whGL+9VnH6CCEu4x8dxdQ+QLRE3EtLlO31SPNhqkzjyVwhktNuivhiZkxweDNynvk+pDleTPvzUvF0YSoggcoiEq+kGr6/c9vUPOMuuv1k7PR1AO05JHmu7vea9/UBaV+TFA6/cGRwVh5i5D2s1Ws7qiDlBl4R+Wp3+TbNLPjbn/SeOz5ZSjAmXThK0H0RXwbcwHo9bVm0Hzu/1n7silII4ZzjAW7dd5Jvbxb66mxC8NWaNabPksus2mTIBhg==>
15:12:35,002  INFO [main] com.documentum.fc.client.security.impl.JKSKeystoreUtilForDfc - keystore file name is /tmp/655905.tmp/dfc.keystore
15:12:35,119  INFO [main] com.documentum.fc.client.security.impl.DfcIdentityPublisher - found client registration: false
15:12:36,317  INFO [main] com.documentum.fc.client.privilege.impl.PublicKeyCertificate - stored certificate for CN
15:12:36,353  INFO [main] com.documentum.fc.client.security.impl.IpAndRcHelper - filling in GR_DocBase a new record with this persistent certificate:
-----BEGIN CERTIFICATE-----
MIIDHzCCAgcCELGIh8FYcycggMmImLESjEYwDQYJKoZIhvcNAQELBQAwTjETMBEG
YXZxbFJuN1lRZFlUTXRQNnBWNnpRY3JBYTAeFw0xOTAyMDExNTA3MzNaFw0yOTAx
MjkxNTEyMzNaME4xEzARBgNVBAsMCkRvY3VtZW50dW0xDDAKBgNVBAoMA0VNQzEp
hKnQmaMo/wCv+QXZTCsitrBNvoomcT82mYzwIxV5/7cPCIHHMcJijsJCtunjiucV
MCcGA1UEAwwgZGZjX1VuSWF2cWxSbjdZUWRZVE10UDZwVjZ6UWNyQWEwggEiMA0G
HcL0KUImSV7owDqKzV3lEYCGdomX4gYTI5bMKAiTEuGyWRKw2YTQGhfp5y0mU0hV
ORTYyRoGjpRUuXWpdrsrbX8g8gD9l6ijWTSIWfTGO/7//mTHp2zwp/TiIEuAS+RA
eFw1pBLSCKneYgquMuiyFfuCfBVNY5Q0MzyPHYxrDAp4CtjasIrNT5h3AgMBAAEw
CSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC4Hli+niUAD0ksVVWocPnvzV10ZOj2
DQYJKoZIhvcNAQELBQADggEBAEAre45NEpqzGMMYX1zpjgib9wldSmiPVDZbhj17
KnUCgDy7FhFQ5U5w6wf2iO9UxGV42AYQe2TjED0EbYwpYB8DC970J2ZrjZRFMy/Y
A1UECwwKRG9jdW1lbnR1bTEMMAoGA1UECgwDRU1DMSkwJwYDVQQDDCBkZmNfVW5J
gwKynVf9O10GQP0a8Z6Fr3jrtCEzfLjOXN0VxEcgwOEKRWHM4auxjevqGCPegD+y
FVWwylyIsMRsC9hOxoNHZPrbhk3N9Syhqsbl+Z9WXG0Sp4uh1z5R1NwVhR7YjZkF
19cfN8uEHqedJo26lq7oFF2KLJ+/8sWrh2a6lrb4fNXYZIAaYKjAjsUzcejij8en
Rd8yvghCc4iwWvpiRg9CW0VF+dXg6KkQmaFjiGrVosskUjuACHncatiYC5lDNJy+
TDdnNWYlctfWcT8WL/hX6FRGedT9S30GShWJNobM9vECoNg=
-----END CERTIFICATE-----
15:12:36,355  INFO [main] com.documentum.fc.client.security.impl.DfcIdentityPublisher - found client registration: false
15:12:36,535  INFO [main] com.documentum.fc.client.security.impl.IpAndRcHelper - filling a new registration record for dfc_UnYQdYTP6pV6zRn7tQMIavqlcrAa
15:12:36,563  INFO [main] com.documentum.fc.client.security.impl.DfcIdentityPublisher - [DFC_SECURITY_GR_REGISTRATION_PUBLISH] this dfc instance is now published in the global registry GR_DocBase
15:12:37,513  INFO [main] com.documentum.fc.client.impl.connection.docbase.DocbaseConnection - Object protocol version 2
15:12:38,773  INFO [main] com.documentum.fc.client.impl.connection.docbase.DocbaseConnection - Object protocol version 2
15:12:39,314  INFO [main] com.documentum.install.shared.common.services.dfc.DiDfcProperties - Installer is adding it as primary connection broker and moves existing primary as backup.
15:12:41,643  INFO [main]  - The installer updates dfc.properties file.
15:12:41,644  INFO [main] com.documentum.install.shared.common.services.dfc.DiDfcProperties - Installer is adding it as primary connection broker and moves existing primary as backup.
15:12:41,649  INFO [main] com.documentum.install.server.installanywhere.actions.DiWAServerEnableLockBoxValidation - The installer will validate AEK/Lockbox fileds.
15:12:41,656  INFO [main] com.documentum.install.shared.common.services.dfc.DiDfcProperties - Installer is changing primary as backup and backup as primary.
15:12:43,874  INFO [main]  - The installer updates dfc.properties file.
15:12:43,874  INFO [main] com.documentum.install.shared.common.services.dfc.DiDfcProperties - Installer is changing primary as backup and backup as primary.
15:12:43,876  INFO [main]  - The installer is creating folders for the selected repository.
15:12:43,876  INFO [main]  - Checking if cfs is being installed on the primary server...
15:12:43,877  INFO [main]  - CFS is not being installed on the primary server
15:12:43,877  INFO [main]  - Installer creates necessary directory structure.
15:12:43,879  INFO [main]  - Installer copies aek.key, server.ini, dbpasswd.txt and webcache.ini files from primary server.
15:12:43,881  INFO [main]  - Installer executes dm_rcs_copyfiles.ebs to get files from primary server
15:12:56,295  INFO [main]  - $DOCUMENTUM/dba/config/DocBase1/dbpasswd.txt has been created successfully
15:12:56,302  INFO [main]  - $DOCUMENTUM/dba/config/DocBase1/webcache.ini has been created successfully
15:12:56,305  INFO [main]  - Installer found exising file $DOCUMENTUM/dba/secure/lockbox.lb
15:12:56,305  INFO [main]  - Installer renamed exising file $DOCUMENTUM/dba/secure/lockbox.lb to $DOCUMENTUM/dba/secure/lockbox.lb.bak.3
15:12:56,306  INFO [main]  - $DOCUMENTUM/dba/secure/lockbox.lb has been created successfully
15:12:56,927  INFO [main]  - $DOCUMENTUM/dba/config/DocBase1/server_content_server_03_DocBase1.ini has been created successfully
15:12:56,928  INFO [main]  - Installer found exising file $DOCUMENTUM/dba/castore_license
15:12:56,928  INFO [main]  - Installer renamed exising file $DOCUMENTUM/dba/castore_license to $DOCUMENTUM/dba/castore_license.bak.3
15:12:56,928  INFO [main]  - $DOCUMENTUM/dba/castore_license has been created successfully
15:12:56,931  INFO [main]  - $DOCUMENTUM/dba/config/DocBase1/ldap_080f123450006deb.cnt has been created successfully
15:12:56,934  INFO [main]  - Installer updates server.ini
15:12:56,940  INFO [main]  - The installer tests database connection.
15:12:57,675  INFO [main]  - Database successfully opened.
Test table successfully created.
Test view successfully created.
Test index successfully created.
Insert into table successfully done.
Index successfully dropped.
View successfully dropped.
Database case sensitivity test successfully past.
Table successfully dropped.
15:13:00,675  INFO [main]  - The installer creates server config object.
15:13:00,853  INFO [main]  - The installer is starting a process for the repository.
15:13:01,993  INFO [main] com.documentum.install.multinode.cfs.installanywhere.actions.DiWAServerCreateContentFileServerPostSeq - logPath is $DOCUMENTUM/dba/log/content_server_03_DocBase1.log
15:13:03,079  INFO [main] com.documentum.install.multinode.cfs.installanywhere.actions.DiWAServerCreateContentFileServerPostSeq - logPath is $DOCUMENTUM/dba/log/content_server_03_DocBase1.log
15:13:04,149  INFO [main] com.documentum.install.multinode.cfs.installanywhere.actions.DiWAServerCreateContentFileServerPostSeq - logPath is $DOCUMENTUM/dba/log/content_server_03_DocBase1.log
15:13:05,187  INFO [main] com.documentum.install.multinode.cfs.installanywhere.actions.DiWAServerCreateContentFileServerPostSeq - logPath is $DOCUMENTUM/dba/log/content_server_03_DocBase1.log
15:13:06,256  INFO [main] com.documentum.install.multinode.cfs.installanywhere.actions.DiWAServerCreateContentFileServerPostSeq - logPath is $DOCUMENTUM/dba/log/content_server_03_DocBase1.log
15:14:06,352  INFO [main]  - Waiting for repository DocBase1.content_server_03_DocBase1 to start up.
15:14:25,003  INFO [main] com.documentum.fc.client.impl.connection.docbase.DocbaseConnection - Object protocol version 2
15:14:25,495  INFO [main] com.documentum.fc.client.security.impl.JKSKeystoreUtilForDfc - keystore file name is /tmp/655905.tmp/dfc.keystore
15:14:25,498  INFO [main] com.documentum.fc.client.security.impl.JKSKeystoreUtilForDfc - keystore file name is /tmp/655905.tmp/dfc.keystore
15:14:25,513  INFO [main] com.documentum.fc.client.security.impl.DfcIdentityPublisher - found client registration: true
15:14:25,672  INFO [main] com.documentum.fc.client.security.impl.DfcRightsCreator - assigning rights to all roles for this client on DocBase1
15:14:25,682  INFO [main] com.documentum.fc.client.security.impl.DfcRightsCreator - found client rights: false
15:14:25,736  INFO [main] com.documentum.fc.client.privilege.impl.PublicKeyCertificate - stored certificate for CN
15:14:25,785  INFO [main] com.documentum.fc.client.security.impl.IpAndRcHelper - filling in DocBase1 a new record with this persistent certificate:
-----BEGIN CERTIFICATE-----
MIIDHzCCAgcCELGIh8FYcycggMmImLESjEYwDQYJKoZIhvcNAQELBQAwTjETMBEG
YXZxbFJuN1lRZFlUTXRQNnBWNnpRY3JBYTAeFw0xOTAyMDExNTA3MzNaFw0yOTAx
MjkxNTEyMzNaME4xEzARBgNVBAsMCkRvY3VtZW50dW0xDDAKBgNVBAoMA0VNQzEp
hKnQmaMo/wCv+QXZTCsitrBNvoomcT82mYzwIxV5/7cPCIHHMcJijsJCtunjiucV
MCcGA1UEAwwgZGZjX1VuSWF2cWxSbjdZUWRZVE10UDZwVjZ6UWNyQWEwggEiMA0G
HcL0KUImSV7owDqKzV3lEYCGdomX4gYTI5bMKAiTEuGyWRKw2YTQGhfp5y0mU0hV
ORTYyRoGjpRUuXWpdrsrbX8g8gD9l6ijWTSIWfTGO/7//mTHp2zwp/TiIEuAS+RA
eFw1pBLSCKneYgquMuiyFfuCfBVNY5Q0MzyPHYxrDAp4CtjasIrNT5h3AgMBAAEw
CSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC4Hli+niUAD0ksVVWocPnvzV10ZOj2
DQYJKoZIhvcNAQELBQADggEBAEAre45NEpqzGMMYX1zpjgib9wldSmiPVDZbhj17
KnUCgDy7FhFQ5U5w6wf2iO9UxGV42AYQe2TjED0EbYwpYB8DC970J2ZrjZRFMy/Y
A1UECwwKRG9jdW1lbnR1bTEMMAoGA1UECgwDRU1DMSkwJwYDVQQDDCBkZmNfVW5J
gwKynVf9O10GQP0a8Z6Fr3jrtCEzfLjOXN0VxEcgwOEKRWHM4auxjevqGCPegD+y
FVWwylyIsMRsC9hOxoNHZPrbhk3N9Syhqsbl+Z9WXG0Sp4uh1z5R1NwVhR7YjZkF
19cfN8uEHqedJo26lq7oFF2KLJ+/8sWrh2a6lrb4fNXYZIAaYKjAjsUzcejij8en
Rd8yvghCc4iwWvpiRg9CW0VF+dXg6KkQmaFjiGrVosskUjuACHncatiYC5lDNJy+
TDdnNWYlctfWcT8WL/hX6FRGedT9S30GShWJNobM9vECoNg=
-----END CERTIFICATE-----
15:14:25,789  INFO [main] com.documentum.fc.client.security.impl.DfcIdentityPublisher - found client registration: true
15:14:25,802  INFO [main] com.documentum.fc.client.security.impl.DfcRightsCreator - found client rights: false
15:14:25,981  INFO [main] com.documentum.fc.client.security.impl.IpAndRcHelper - filling a new rights record for dfc_UnYQdYTP6pV6zRn7tQMIavqlcrAa
15:14:26,032  INFO [main] com.documentum.fc.client.security.impl.DfcRightsCreator - [DFC_SECURITY_DOCBASE_RIGHTS_REGISTER] this dfc instance has now escalation rights registered with docbase DocBase1
15:14:26,052  INFO [main] com.documentum.install.appserver.jboss.JbossApplicationServer - setApplicationServer sharedDfcLibDir is:$DOCUMENTUM/shared/dfc
15:14:26,052  INFO [main] com.documentum.install.appserver.jboss.JbossApplicationServer - getFileFromResource for templates/appserver.properties
15:14:26,059  INFO [main] com.documentum.install.server.installanywhere.actions.DiWAServerAddDocbaseEntryToWebXML - BPM webapp does not exist.
15:14:26,191  INFO [main] com.documentum.install.server.installanywhere.actions.cfs.DiWAServerProcessingScripts2 - Executing the Docbase HeadStart script.
15:14:36,202  INFO [main] com.documentum.install.server.installanywhere.actions.cfs.DiWAServerProcessingScripts2 - Executing the Creates ACS config object script.
15:14:46,688  INFO [main] com.documentum.install.server.installanywhere.actions.cfs.DiWAServerProcessingScripts2 - Executing the This script does miscellaneous setup tasks for remote content servers script.
15:14:56,840 ERROR [main] com.documentum.install.server.installanywhere.actions.cfs.DiWAServerProcessingScripts2 - The installer failed to execute the This script does miscellaneous setup tasks for remote content servers script. For more information, please read output file: $DOCUMENTUM/dba/config/DocBase1/dm_rcs_setup.out.
com.documentum.install.shared.common.error.DiException: The installer failed to execute the This script does miscellaneous setup tasks for remote content servers script. For more information, please read output file: $DOCUMENTUM/dba/config/DocBase1/dm_rcs_setup.out.
        at com.documentum.install.server.installanywhere.actions.cfs.DiWAServerProcessingScripts2.setup(DiWAServerProcessingScripts2.java:98)
        at com.documentum.install.shared.installanywhere.actions.InstallWizardAction.install(InstallWizardAction.java:75)
        at com.zerog.ia.installer.actions.CustomAction.installSelf(Unknown Source)
        at com.zerog.ia.installer.AAMgrBase.an(Unknown Source)
        at com.zerog.ia.installer.ConsoleBasedAAMgr.ac(Unknown Source)
        at com.zerog.ia.installer.AAMgrBase.am(Unknown Source)
        at com.zerog.ia.installer.AAMgrBase.runNextInstallPiece(Unknown Source)
        ...
        at com.zerog.ia.installer.ConsoleBasedAAMgr.ac(Unknown Source)
        at com.zerog.ia.installer.AAMgrBase.runPreInstall(Unknown Source)
        at com.zerog.ia.installer.LifeCycleManager.consoleInstallMain(Unknown Source)
        at com.zerog.ia.installer.LifeCycleManager.executeApplication(Unknown Source)
        at com.zerog.ia.installer.Main.main(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.zerog.lax.LAX.launch(Unknown Source)
        at com.zerog.lax.LAX.main(Unknown Source)
15:14:56,843  INFO [main]  - The INSTALLER_UI value is SILENT
15:14:56,843  INFO [main]  - The KEEP_TEMP_FILE value is true
15:14:56,843  INFO [main]  - The common.installOwner.password value is ******
15:14:56,843  INFO [main]  - The SERVER.SECURE.ROOT_PASSWORD value is ******
15:14:56,843  INFO [main]  - The common.upgrade.aek.lockbox value is null
15:14:56,843  INFO [main]  - The common.old.aek.passphrase.password value is null
15:14:56,843  INFO [main]  - The common.aek.algorithm value is AES_256_CBC
15:14:56,843  INFO [main]  - The common.aek.passphrase.password value is ******
15:14:56,843  INFO [main]  - The common.aek.key.name value is CSaek
15:14:56,843  INFO [main]  - The common.use.existing.aek.lockbox value is null
15:14:56,843  INFO [main]  - The SERVER.ENABLE_LOCKBOX value is true
15:14:56,844  INFO [main]  - The SERVER.LOCKBOX_FILE_NAME value is lockbox.lb
15:14:56,844  INFO [main]  - The SERVER.LOCKBOX_PASSPHRASE.PASSWORD value is ******
15:14:56,844  INFO [main]  - The SERVER.COMPONENT_ACTION value is CREATE
15:14:56,844  INFO [main]  - The SERVER.DOCBROKER_ACTION value is null
15:14:56,844  INFO [main]  - The SERVER.PRIMARY_CONNECTION_BROKER_HOST value is content_server_01.dbi-services.com
15:14:56,844  INFO [main]  - The SERVER.PRIMARY_CONNECTION_BROKER_PORT value is 1489
15:14:56,844  INFO [main]  - The SERVER.PROJECTED_CONNECTION_BROKER_HOST value is content_server_03.dbi-services.com
15:14:56,844  INFO [main]  - The SERVER.PROJECTED_CONNECTION_BROKER_PORT value is 1489
15:14:56,844  INFO [main]  - The SERVER.FQDN value is content_server_03.dbi-services.com
15:14:56,845  INFO [main]  - The SERVER.DOCBASE_NAME value is DocBase1
15:14:56,845  INFO [main]  - The SERVER.PRIMARY_SERVER_CONFIG_NAME value is DocBase1
15:14:56,845  INFO [main]  - The SERVER.REPOSITORY_USERNAME value is dmadmin
15:14:56,845  INFO [main]  - The SERVER.SECURE.REPOSITORY_PASSWORD value is ******
15:14:56,845  INFO [main]  - The SERVER.REPOSITORY_USER_DOMAIN value is
15:14:56,845  INFO [main]  - The SERVER.REPOSITORY_USERNAME_WITH_DOMAIN value is dmadmin
15:14:56,845  INFO [main]  - The SERVER.REPOSITORY_HOSTNAME value is content_server_01.dbi-services.com
15:14:56,845  INFO [main]  - The SERVER.CONNECTION_BROKER_NAME value is null
15:14:56,845  INFO [main]  - The SERVER.CONNECTION_BROKER_PORT value is null
15:14:56,846  INFO [main]  - The SERVER.DOCBROKER_NAME value is
15:14:56,846  INFO [main]  - The SERVER.DOCBROKER_PORT value is
15:14:56,846  INFO [main]  - The SERVER.DOCBROKER_CONNECT_MODE value is null
15:14:56,846  INFO [main]  - The SERVER.USE_CERTIFICATES value is false
15:14:56,846  INFO [main]  - The SERVER.DOCBROKER_KEYSTORE_FILE_NAME value is null
15:14:56,846  INFO [main]  - The SERVER.DOCBROKER_KEYSTORE_PASSWORD_FILE_NAME value is null
15:14:56,846  INFO [main]  - The SERVER.DOCBROKER_CIPHER_LIST value is null
15:14:56,853  INFO [main]  - The SERVER.DFC_SSL_TRUSTSTORE value is null
15:14:56,853  INFO [main]  - The SERVER.DFC_SSL_TRUSTSTORE_PASSWORD value is ******
15:14:56,853  INFO [main]  - The SERVER.DFC_SSL_USE_EXISTING_TRUSTSTORE value is null
15:14:56,853  INFO [main]  - The SERVER.CONNECTION_BROKER_SERVICE_STARTUP_TYPE value is null
15:14:56,854  INFO [main]  - The SERVER.DOCUMENTUM_DATA value is $DATA
15:14:56,854  INFO [main]  - The SERVER.DOCUMENTUM_SHARE value is $DOCUMENTUM/share
15:14:56,854  INFO [main]  - The CFS_SERVER_CONFIG_NAME value is content_server_03_DocBase1
15:14:56,854  INFO [main]  - The SERVER.DOCBASE_SERVICE_NAME value is DocBase1
15:14:56,854  INFO [main]  - The CLIENT_CERTIFICATE value is null
15:14:56,854  INFO [main]  - The RKM_PASSWORD value is ******
15:14:56,854  INFO [main]  - The SERVER.DFC_BOF_GLOBAL_REGISTRY_VALIDATE_OPTION_IS_SELECTED value is null
15:14:56,854  INFO [main]  - The SERVER.PROJECTED_DOCBROKER_PORT_OTHER value is null
15:14:56,854  INFO [main]  - The SERVER.PROJECTED_DOCBROKER_HOST_OTHER value is null
15:14:56,854  INFO [main]  - The SERVER.GLOBAL_REGISTRY_REPOSITORY value is null
15:14:56,854  INFO [main]  - The SERVER.BOF_REGISTRY_USER_LOGIN_NAME value is null
15:14:56,855  INFO [main]  - The SERVER.SECURE.BOF_REGISTRY_USER_PASSWORD value is ******
15:14:56,855  INFO [main]  - The SERVER.COMPONENT_ACTION value is CREATE
15:14:56,855  INFO [main]  - The SERVER.COMPONENT_NAME value is null
15:14:56,855  INFO [main]  - The SERVER.DOCBASE_NAME value is DocBase1
15:14:56,855  INFO [main]  - The SERVER.CONNECTION_BROKER_NAME value is null
15:14:56,855  INFO [main]  - The SERVER.CONNECTION_BROKER_PORT value is null
15:14:56,855  INFO [main]  - The SERVER.PROJECTED_CONNECTION_BROKER_HOST value is content_server_03.dbi-services.com
15:14:56,855  INFO [main]  - The SERVER.PROJECTED_CONNECTION_BROKER_PORT value is 1489
15:14:56,855  INFO [main]  - The SERVER.PRIMARY_SERVER_CONFIG_NAME value is DocBase1
15:14:56,855  INFO [main]  - The SERVER.DOCBROKER_NAME value is
15:14:56,856  INFO [main]  - The SERVER.DOCBROKER_PORT value is
15:14:56,856  INFO [main]  - The SERVER.CONNECTION_BROKER_SERVICE_STARTUP_TYPE value is null
15:14:56,856  INFO [main]  - The SERVER.REPOSITORY_USERNAME value is dmadmin
15:14:56,856  INFO [main]  - The SERVER.REPOSITORY_PASSWORD value is ******
15:14:56,856  INFO [main]  - The SERVER.REPOSITORY_USER_DOMAIN value is
15:14:56,856  INFO [main]  - The SERVER.REPOSITORY_USERNAME_WITH_DOMAIN value is dmadmin
15:14:56,856  INFO [main]  - The SERVER.DFC_BOF_GLOBAL_REGISTRY_VALIDATE_OPTION_IS_SELECTED_KEY value is null
15:14:56,856  INFO [main]  - The SERVER.PROJECTED_DOCBROKER_PORT_OTHER value is null
15:14:56,856  INFO [main]  - The SERVER.PROJECTED_DOCBROKER_HOST_OTHER value is null
15:14:56,856  INFO [main]  - The SERVER.GLOBAL_REGISTRY_REPOSITORY value is null
15:14:56,856  INFO [main]  - The SERVER.BOF_REGISTRY_USER_LOGIN_NAME value is null
15:14:56,856  INFO [main]  - The SERVER.SECURE.BOF_REGISTRY_USER_PASSWORD value is ******
15:14:56,856  INFO [main]  - The SERVER.COMPONENT_ACTION value is CREATE
15:14:56,857  INFO [main]  - The SERVER.COMPONENT_NAME value is null
15:14:56,857  INFO [main]  - The SERVER.PRIMARY_SERVER_CONFIG_NAME value is DocBase1
15:14:56,857  INFO [main]  - The SERVER.DOCBASE_NAME value is DocBase1
15:14:56,857  INFO [main]  - The SERVER.REPOSITORY_USERNAME value is dmadmin
15:14:56,857  INFO [main]  - The SERVER.REPOSITORY_PASSWORD value is ******
15:14:56,857  INFO [main]  - The SERVER.REPOSITORY_USER_DOMAIN value is
15:14:56,857  INFO [main]  - The SERVER.REPOSITORY_USERNAME_WITH_DOMAIN value is dmadmin
15:14:56,857  INFO [main]  - The env PATH value is: /usr/xpg4/bin:$DOCUMENTUM/shared/java64/JAVA_LINK/bin:$DM_HOME/bin:$DOCUMENTUM/dba:$ORACLE_HOME/bin:$DOCUMENTUM/shared/java64/JAVA_LINK/bin:$DM_HOME/bin:$DOCUMENTUM/dba:$ORACLE_HOME/bin:$DM_HOME/bin:$ORACLE_HOME/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/dmadmin/bin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin
[dmadmin@content_server_03 logs]$

 

As you can see above, everything was going well until the script “This script does miscellaneous setup tasks for remote content servers” is executed. Yes that is a hell of a description, isn’t it? What this script is doing is actually running the “dm_rcs_setup.ebs” script (you can find it under $DM_HOME/install/admin/) on the repository to setup the remote jobs, project the RCS/CFS repository to the local docbroker, create the log folder and a few other things. Here was the content of the output file for the execution of this EBS:

[dmadmin@content_server_03 logs]$ cat $DOCUMENTUM/dba/config/DocBase1/dm_rcs_setup.out
Running dm_rcs_setup.ebs script on docbase DocBase1.content_server_03_DocBase1 to set up jobs for a remote content server.
docbaseNameOnly = DocBase1
Connected To DocBase1.content_server_03_DocBase1
$DOCUMENTUM/dba/log/000f1234/sysadmin was created.
Duplicating distributed jobs.
Creating job object for dm_ContentWarningcontent_server_03_DocBase1
Successfully created job object for dm_ContentWarningcontent_server_03_DocBase1
Creating job object for dm_LogPurgecontent_server_03_DocBase1
Successfully created job object for dm_LogPurgecontent_server_03_DocBase1
Creating job object for dm_ContentReplicationcontent_server_03_DocBase1
Successfully created job object for dm_ContentReplicationcontent_server_03_DocBase1
Creating job object for dm_DMCleancontent_server_03_DocBase1
The dm_DMClean job does not exist at the primary server so we will not create it at the remote site, either.
Failed to create job object for dm_DMCleancontent_server_03_DocBase1
[DM_API_E_BADID]error:  "Bad ID given: 0000000000000000"

[DM_API_E_BADID]error:  "Bad ID given: 0000000000000000"

[DM_API_E_BADID]error:  "Bad ID given: 0000000000000000"

[DM_API_E_BADID]error:  "Bad ID given: 0000000000000000"

[DM_API_E_NO_MATCH]error:  "There was no match in the docbase for the qualification: dm_job where object_name = 'dm_DMClean' and lower(target_server) like lower('DocBase1.DocBase1@%')"


Exiting with return code (-1)
[dmadmin@content_server_03 logs]$
[dmadmin@content_server_03 logs]$

 

The RCS/CFS installation is failing because the creation of a remote job cannot complete successfully. It’s working properly for 3 out of the 5 remote jobs but not for the 2 remaining. Only one is shown in the log file because it didn’t even try to process the 2nd one since it failed already and therefore stopped the installation here. That’s why the start/stop scripts were there, the log folder was there, the dm_server_config was ok as well but there were some missing pieces actually.

The issue here is that the RCS/CFS installation isn’t able to find the r_object_id of the “dm_DMClean” job (it mention “Bad ID given: 0000000000000000”) and therefore it’s not able to create the remote job. The last message is actually more interesting: “There was no match in the docbase for the qualification: dm_job where object_name = ‘dm_DMClean’ and lower(target_server) like lower(‘DocBase1.DocBase1@%’)”.

The RCS/CFS installation is actually looking at the job with the name ‘dm_DMClean’, which is OK but it is also filtering only on the target_server which is equal to ‘docbase_name.server_config_name@…’ and here, it’s not finding any result.

 

So what happened? Like I was saying in the introduction, this environment was already installed several years ago in HA already. As a result of that, the jobs were already configured by us as we would expect them. Usually, we are configuring the jobs as follow (I’m only talking about the distributed jobs here):

Job Name on CS1 Job Status on CS1 Job Name on RCS% Job Status on RCS% dm_ContentWarning Active dm_ContentWarning% Inactive dm_LogPurge Active dm_LogPurge% Active dm_DMClean Active dm_DMClean% Inactive dm_DMFilescan Active dm_DMFilescan% Inactive dm_ContentReplication Inactive dm_ContentReplication% Inactive

Based on this, we usually disable the dm_ContentReplication completely (if it’s not needed), we obviously leave the dm_LogPurge enabled (all of them) with the target_server set to the local CS it is supposed to run into (so 1 job per CS). Then for the 3 remaining jobs, it depends on the load of the environment. These jobs can be set to run on the CS1 by setting the target_server equal to ‘DocBase1.DocBase1@content_server_03.dbi-services.com’ or you can set them to run on ANY Content Server by setting an empty target_server (a single space: ‘ ‘). It doesn’t matter where they are running but it is important for these jobs to run and hence the setting to ANY available Content Server is better so it’s not bound to a single point of failure.

So the reason why the RCS/CFS installation failed is because we configured our jobs properly… Funny, right? As you could see in the logs, the dm_ContentWarning was created properly but that was because someone was doing some testing with this job and it was temporarily set to run on the CS1 only and therefore, when the installer checked it, it was a coincidence/luck that it could find it.

After the failure, there is normally not much done except creating the JMS config object, checking the ACS URLs and finally restarting the JMS but still, it is cleaner to just remove the RCS/CFS, clean the repository objects still remaining (the distributed jobs that were created) and then reinstalling the RCS/CFS after setting the jobs as the installer expects them to be…

 

Cet article Documentum – RCS/CFS installation failure est apparu en premier sur Blog dbi services.

How to stop Documentum processes in a docker container, and more (part I)

Sat, 2019-04-13 07:00
How to stop Documentum processes in a docker container, and more

Ideally, but not mandatorily, the management of Documentum processes is performed at the service level, e.g. by systemd. In my blog here, I showed how to configure init files for Documentum under systemd. But containers don’t have systemd, yet. They just run processes, often only one, sometimes more if they are closely related together (e.g. the docbroker, the method server and the content servers), so how to replicate the same functionality with containers ?
The topic of stopping processes in a docker container is abundantly discussed on-line (see for example the excellent article here). O/S signals are the magic solution so much so that I should have entitled this blog “Fun with the Signals” really !
I’ll simply see here if the presented approach can be applied in the particular case of a dockerized Documentum server. However, in order to keep things simple and quick, I won’t test such a real dockerized Documentum installation but rather use a script to simulate the Documentum processes, or any other processes at that since it is so generic.
But first, why bother with this matter ? During all the years that I have been administrating repositories I’ve never noticed anything going wrong after restarting a suddenly stopped server, be it after an intentional kill, a pesky crash or an unplanned database unavailability. Evidently, the content server (CS henceforth) seems quite robust in this respect. Or maybe we were simply lucky so far. Personally, I don’t feel confident if I don’t shut down cleanly a process or service that must be stopped; some data might be still buffered in the CS’ memory and not flushing them properly might introduce inconsistencies or even corruptions. The same goes when an unsuspected multi-step operation is started and aborted abruptly in the middle; ideally, transactions, if they are used, exist for this purpose but anything can go wrong during rollback. Killing a process is like slamming a door, it produces a lot of noise, vibrations in the walls, even damages in the long run and always leaves a bad impression behind. Isn’t it more comforting to clean up and shut the door gently ? Even then something can go wrong but at least it will be through no fault of our own.

A few Reminders

When a “docker container stop” is issued, docker sends the SIGTERM signal to the process with PID == 1 running inside the container. That process, if programmed to do so, can then react to the signal and do anything seen fit, typically shutting the running processes down cleanly. After a 10 seconds grace period, the container is stopped manu militari. In the case of Documentum processes, to put it politely, they don’t give a hoot to signals, except of course to the well-known, unceremonious SIGKILL one. Thus, a proxy process must be introduced which will accept the signal and invoke the proper shutdown scripts to stop the CS processes, usually the dm_shutdown_* and dm_stop_* scripts, or a generic one that takes care of everything, at start up and at shut down time.
Said proxy must run with PID == 1 i.e. it must be the first one started in the container. Sort of, but even if it is not the very first, its PID 1 parent can pass it the control by using one of the exec() family functions; unlike forking a process, those in effect allow a child process to replace its parent under the latter’s PID, kind of like in the Matrix movies the agents Smith inject themselves into someone else’s persona, if you will ;-). The main thing being that at one point the proxy becomes PID 1. Luckily for us, we don’t have to bother with this complexity for the dockerfile’s ENTRYPOINT[] clause takes care of everything.
The proxy also will be the one that starts the CS. In addition, since it must wait for the SIGTERM signal, it must never exit. It can indefinitely wait listening on a fake input (e.g. tail -f /dev/null), or wait for an illusory input in a detached container (e.g. while true; do read; done) or, better yet, do something useful like some light-weight monitoring.
While at it, the proxy process can listen to several conventional signals and react accordingly. For instance, a SIGUSR1 could mean “give me a docbroker docbase map” and a SIGUSR2 “restart the method server”. Admittedly, these actions could be done directly by just executing the relevant commands inside the container or from the outside command-line but the signal way is cheaper and, OK, funnier. So, let’s see how we can set all this up !

The implementation

As said, in order to focus on our topic, i.e. signal trapping, we’ve replaced the CS part with a simple simulation script, dctm.sh, that starts, stops and queries the status of dummy processes. It uses the bash shell and has been written under linux. Here it is:

#!/bin/bash
# launches in the background, or stops or queries the status of, a command with a conventional identification string in the prompt;
# the id is a random number determined during the start;
# it should be passed to enquiry the status of the started process or to stop it;
# Usage:
#   ./dctm.sh stop  | start | status 
# e.g.:
# $ ./dctm.sh start
# | process started with pid 13562 and random value 33699963
# $ psg 33699963
# | docker   13562     1  0 23:39 pts/0    00:00:00 I am number 33699963
# $ ./dctm.sh status 33699963
# $ ./dctm.sh stop 33699963
#
# cec - dbi-services - April 2019
#
trap 'help'         SIGURG
trap 'start_all'    SIGPWR
trap 'start_one'    SIGUSR1
trap 'status_all'   SIGUSR2
trap 'stop_all'     SIGINT SIGABRT
trap 'shutdown_all' SIGHUP SIGQUIT SIGTERM

verb="sleep"
export id_prefix="I am number"

func() {
   cmd="$1"
   case $cmd in
      start)
         # do something that sticks forever-ish, min ca. 20mn;
         (( params = 1111 * $RANDOM ))
         exec -a "$id_prefix" $verb $params &
         echo "process started with pid $! and random value $params"
         ;;
      stop)
         params=" $2"
         pid=$(ps -ajxf | gawk -v params="$params" '{if (match($0, " " ENVIRON["id_prefix"] params "$")) pid = $2} END {print (pid ? pid : "")}')
         if [[ ! -z $pid ]]; then
            kill -9 ${pid} &> /dev/null
            wait ${pid} &> /dev/null
         fi
         ;;
      status)
         params=" $2"
         read pid gid < <(ps -ajxf | gawk -v params="$params" '{if (match($0, " " ENVIRON["id_prefix"] params "$")) pid = $2 " " $3} END {print (pid ? pid : "")}')
         if [[ ! -z $pid ]]; then
            echo "random value${params} is used by process with pid $pid and pgid $gid"
         else
            echo "no such process running"
         fi
         ;;
   esac
}

help() {
   echo
   echo "send signal SIGURG for help"
   echo "send signal SIGPWR to start a few processes"
   echo "send signal SIGUSR1 to start a new process"
   echo "send signal SIGUSR2 for the list of started processes"
   echo "send signal SIGINT | SIGABRT  to stop all the processes"
   echo "send signal SIGHUP | SIGQUIT | SIGTERM to shutdown the processes and exit the container"
}

start_all() {
   echo; echo "starting a few processes at $(date +"%Y/%m/%d %H:%M:%S")"
   for loop in $(seq 5); do
      func start
   done

   # show them;
   echo; echo "started processes"
   ps -ajxf | grep "$id_prefix" | grep -v grep
}

start_one() {
   echo; echo "starting a new process at $(date +"%Y/%m/%d %H:%M:%S")"
   func start
}

status_all() {
   echo; echo "status of running processes at $(date +"%Y/%m/%d %H:%M:%S")"
   for no in $(ps -ef | grep "I am number " | grep -v grep | gawk '{print $NF}'); do
      echo "showing $no"
      func status $no
   done
}

stop_all() {
   echo; echo "shutting down the processes at $(date +"%Y/%m/%d %H:%M:%S")"
   for no in $(ps -ef | grep "I am number " | grep -v grep | gawk '{print $NF}'); do
      echo "stopping $no"
      func stop $no
   done
}

shutdown_all() {
   echo; echo "shutting down the container at $(date +"%Y/%m/%d %H:%M:%S")"
   stop_all
   exit 0
}

# -----------
# main;
# -----------

# starts a few dummy processes;
start_all

# display some usage explanation;
help

# make sure the container stays up and waits for signals;
while true; do read; done

The main part of the script starts a few processes, displays a help screen and then waits for input from stdin.
The script can be first tested outside a container as follows.
Run the script:

./dctm.sh

It will start a few easily distinguishable processes and display a help screen:

starting a few processes at 2019/04/06 16:05:35
process started with pid 17621 and random value 19580264
process started with pid 17622 and random value 19094757
process started with pid 17623 and random value 18211512
process started with pid 17624 and random value 3680743
process started with pid 17625 and random value 18198180
 
started processes
17619 17621 17619 1994 pts/0 17619 S+ 1000 0:00 | \_ I am number 19580264
17619 17622 17619 1994 pts/0 17619 S+ 1000 0:00 | \_ I am number 19094757
17619 17623 17619 1994 pts/0 17619 S+ 1000 0:00 | \_ I am number 18211512
17619 17624 17619 1994 pts/0 17619 S+ 1000 0:00 | \_ I am number 3680743
17619 17625 17619 1994 pts/0 17619 S+ 1000 0:00 | \_ I am number 18198180
 
send signal SIGURG for help
send signal SIGPWR to start a few processes
send signal SIGUSR1 to start a new process
send signal SIGUSR2 for the list of started processes
send signal SIGINT | SIGABRT to stop all the processes
send signal SIGHUP | SIGQUIT | SIGTERM to shutdown the processes and exit the container

Then, it will simply sit there and wait until it is asked to quit.
From another terminal, let’s check the started processes:

ps -ef | grep "I am number " | grep -v grep
docker 17621 17619 0 14:40 pts/0 00:00:00 I am number 19580264
docker 17622 17619 0 14:40 pts/0 00:00:00 I am number 19094757
docker 17623 17619 0 14:40 pts/0 00:00:00 I am number 18211512
docker 17624 17619 0 14:40 pts/0 00:00:00 I am number 3680743
docker 17625 17619 0 14:40 pts/0 00:00:00 I am number 18198180

Those processes could be Documentum ones or anything else, the point here is to control them from the outside, e.g. another terminal session, in or out of a docker container. We will do that though O/S signals. The bash shell lets a script listen and react to signals through the trap command. On top of the script, we have listed all the signals we’d like the script to react upon:

trap 'help'         SIGURG
trap 'start_all'    SIGPWR
trap 'start_one'    SIGUSR1
trap 'status_all'   SIGUSR2
trap 'stop_all'     SIGINT SIGABRT
trap 'shutdown_all' SIGHUP SIGQUIT SIGTERM

It’s really a feast of traps !
The first line for example says that on receiving the SIGURG signal, the script’s function help() should be executed, no matter what the script was doing at that time, which in our case is just waiting for input from stdin.
The SIGPWR signal is interpreted as start in the background another suite of five processes with the same naming convention “I am number ” followed with a random number. The function start_all() is called on receiving this signal.
The SIGUSR1 signal starts one new process in the background. Function start_one() does just this.
The SIGUSR2 signal displays all the started processes so far by invoking function status_all().
The SIGINT and SIGABRT signals shut down all the started processes so far. Function stop_all() is called to this purpose.
Finally, signals SIGHUP, SIGQUIT, or SIGTERM all invokes function shutdown_all() to stop all the processes and exit the script.
Admittedly, those signal’s choice is a bit stretched out but this is for the sake of the demonstration so bear with us. Feel free to remap the signals to the functions any way you prefer.
Now, how to send those signals ? The ill-named kill command or program is here for this. Despite its name, nobody will be killed here fortunately; signals will be sent and processes decide to react opportunely. Here, of course, we do react opportunely.
Here is its syntax (let’s use the −−long-options for clarity):

/bin/kill --signal pid

Since bash has a built-in kill command that behaves differently, make sure to call the right program by specifying its full path name, /bin/kill.
Example of use:

/bin/kill --signal SIGURG $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
# or shorter:
/bin/kill --signal SIGURG $(pgrep ^dctm.sh$)

The signal’s target is our test program dctm.sh, which is identified vis a vis kill through its PID.
Signals can be specified by their full name, e.g. SIGURG, SIGPWR, etc… or without the SIG prefix such as URG, PWR, etc … or even through their numeric value as shown below:

/bin/kill -L
1 HUP 2 INT 3 QUIT 4 ILL 5 TRAP 6 ABRT 7 BUS
8 FPE 9 KILL 10 USR1 11 SEGV 12 USR2 13 PIPE 14 ALRM
15 TERM 16 STKFLT 17 CHLD 18 CONT 19 STOP 20 TSTP 21 TTIN
22 TTOU 23 URG 24 XCPU 25 XFSZ 26 VTALRM 27 PROF 28 WINCH
29 POLL 30 PWR 31 SYS
 
or:
 
kill -L
1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP
6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1
11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM
16) SIGSTKFLT 17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP
21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ
26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO 30) SIGPWR
31) SIGSYS 34) SIGRTMIN 35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3
38) SIGRTMIN+4 39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8
43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7
58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2
63) SIGRTMAX-1 64) SIGRTMAX

Thus, the following incantations are equivalent:

/bin/kill --signal SIGURG $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal URG $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal 23 $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')

On receiving the supported signals, the related function is invoked and thereafter the script returns to its former activity, namely the loop that waits for a fake input. The loop is needed otherwise the script would exit on returning from a trap handler. In effect, the trap is processed like a function call and, on returning, the next statement at the point the trap occurred is given control. If there is none, then the script terminates. Hence the loop.
Here is the output after sending a few signals; for clarity, the signals sent from another terminal have been manually inserted as highlighted comments before the output they caused.
Output terminal:

# SIGUSR2:
status of running processes at 2019/04/06 16:12:46
showing 28046084
random value 28046084 is used by process with pid 29248 and pgid 29245
showing 977680
random value 977680 is used by process with pid 29249 and pgid 29245
showing 26299592
random value 26299592 is used by process with pid 29250 and pgid 29245
showing 25982957
random value 25982957 is used by process with pid 29251 and pgid 29245
showing 27830550
random value 27830550 is used by process with pid 29252 and pgid 29245
5 processes found
 
# SIGUSR1:
starting a new process at 2019/04/06 16:18:56
process started with pid 29618 and random value 22120010
 
# SIGUSR2:
status of running processes at 2019/04/06 16:18:56
showing 28046084
random value 28046084 is used by process with pid 29248 and pgid 29245
showing 977680
random value 977680 is used by process with pid 29249 and pgid 29245
showing 26299592
random value 26299592 is used by process with pid 29250 and pgid 29245
showing 25982957
random value 25982957 is used by process with pid 29251 and pgid 29245
showing 27830550
random value 27830550 is used by process with pid 29252 and pgid 29245
showing 22120010
random value 22120010 is used by process with pid 29618 and pgid 29245
6 processes found
 
# SIGURG:
send signal SIGURG for help
send signal SIGPWR to start a few processes
send signal SIGUSR1 to start a new process
send signal SIGUSR2 for the list of started processes
send signal SIGINT | SIGABRT to stop all the processes
send signal SIGHUP | SIGQUIT | SIGTERM to shutdown the processes and exit the container
 
# SIGINT:
shutting down the processes at 2019/04/06 16:20:17
stopping 28046084
stopping 977680
stopping 26299592
stopping 25982957
stopping 27830550
stopping 22120010
6 processes stopped
 
# SIGUSR2:
status of running processes at 2019/04/06 16:20:18
0 processes found
 
# SIGPWR:
starting a few processes at 2019/04/06 16:20:50
process started with pid 29959 and random value 2649735
process started with pid 29960 and random value 14971836
process started with pid 29961 and random value 14339677
process started with pid 29962 and random value 4460665
process started with pid 29963 and random value 12688731
5 processes started
 
started processes:
29245 29959 29245 1994 pts/0 29245 S+ 1000 0:00 | \_ I am number 2649735
29245 29960 29245 1994 pts/0 29245 S+ 1000 0:00 | \_ I am number 14971836
29245 29961 29245 1994 pts/0 29245 S+ 1000 0:00 | \_ I am number 14339677
29245 29962 29245 1994 pts/0 29245 S+ 1000 0:00 | \_ I am number 4460665
29245 29963 29245 1994 pts/0 29245 S+ 1000 0:00 | \_ I am number 12688731
 
# SIGUSR2:
status of running processes at 2019/04/06 16:20:53
showing 2649735
random value 2649735 is used by process with pid 29959 and pgid 29245
showing 14971836
random value 14971836 is used by process with pid 29960 and pgid 29245
showing 14339677
random value 14339677 is used by process with pid 29961 and pgid 29245
showing 4460665
random value 4460665 is used by process with pid 29962 and pgid 29245
showing 12688731
random value 12688731 is used by process with pid 29963 and pgid 29245
5 processes found
 
# SIGTERM:
shutting down the container at 2019/04/06 16:21:42
 
shutting down the processes at 2019/04/06 16:21:42
stopping 2649735
stopping 14971836
stopping 14339677
stopping 4460665
stopping 12688731
5 processes stopped

In the command terminal:

/bin/kill --signal SIGUSR2 $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal SIGUSR1 $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal SIGUSR2 $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal SIGURG $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal SIGINT $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal SIGUSR2 $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal SIGPWR $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal SIGUSR2 $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal SIGTERM $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')

Of course, sending the untrappable SIGKILL signal will abort the process that executes dctm.sh. However, its children processes will survive and be reparented to the root process:

...
status of running processes at 2019/04/10 22:38:25
showing 19996889
random value 19996889 is used by process with pid 24520 and pgid 24398
showing 5022831
random value 5022831 is used by process with pid 24521 and pgid 24398
showing 1363197
random value 1363197 is used by process with pid 24522 and pgid 24398
showing 18185959
random value 18185959 is used by process with pid 24523 and pgid 24398
showing 10996678
random value 10996678 is used by process with pid 24524 and pgid 24398
5 processes found
# /bin/kill --signal SIGKILL $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
Killed
 
ps -ef | grep number | grep -v grep
docker 24520 1 0 22:38 pts/1 00:00:00 I am number 19996889
docker 24521 1 0 22:38 pts/1 00:00:00 I am number 5022831
docker 24522 1 0 22:38 pts/1 00:00:00 I am number 1363197
docker 24523 1 0 22:38 pts/1 00:00:00 I am number 18185959
docker 24524 1 0 22:38 pts/1 00:00:00 I am number 10996678
 
# manual killing those processes;
ps -ef | grep number | grep -v grep | gawk '{print $2}' | xargs kill -9

 
ps -ef | grep number | grep -v grep
<empty>
 
# this works too:
kill -9 $(pgrep -f "I am number [0-9]+$")
# or, shorter:
pkill -f "I am number [0-9]+$"

Note that there is a simpler way to kill those related processes: by using their PGID, or process group id:

ps -axjf | grep number | grep -v grep
1 25248 25221 24997 pts/1 24997 S 1000 0:00 I am number 3489651
1 25249 25221 24997 pts/1 24997 S 1000 0:00 I am number 6789321
1 25250 25221 24997 pts/1 24997 S 1000 0:00 I am number 15840638
1 25251 25221 24997 pts/1 24997 S 1000 0:00 I am number 19059205
1 25252 25221 24997 pts/1 24997 S 1000 0:00 I am number 12857603
# processes have been reparented to PPID == 1;
# highlighted columns 3 is the PGID;
# kill them using negative-PGID;
kill -9 -25221
ps -axjf | grep number | grep -v grep
<empty>

This is why the status() commands displays the PGID.
In order to tell kill that the given PID is actually a PGID, it has to be prefixed with a minus sign. Alternatively, the command:

pkill -g pgid

does that too.
All this looks quite promising so far !
Please, join me now to part II of this article for the dockerization of the test script.

Cet article How to stop Documentum processes in a docker container, and more (part I) est apparu en premier sur Blog dbi services.

How to stop Documentum processes in a docker container, and more (part II)

Sat, 2019-04-13 07:00
ok, Ok, OK, and the docker part ?

In a minute.
In part I of this 2-part article, we showed how traps could be used to control a running executable from the outside. We also presented a bash test script to try out and play with traps. Now that we are confident about that simulation script, let’s dockerize it and try it out in this new environment. We use the dockerfile Dockerfile-dctm to create the CS image and so we include an ENTRYPOINT clause as follows:

FROM ubuntu:latest
RUN apt-get update &&      \
    apt-get install -y gawk
COPY dctm.sh /root/.
ENTRYPOINT ["/root/dctm.sh", "start"]

The above ENTRYPOINT syntax allows to run the dctm.sh script with PID 1 because the initial bash process (which runs with PID 1 obviously) performs an exec call to load and execute that script. To keep the dockerfile simple, the script will run as root. In the real-world, CS processes run as something like dmadmin, so this account would have to be set up in the dockerfile (or through some orchestration software).
When the docker image is run or the container is started, the dctm.sh script gets executed with PID 1; as the script is invoked with the start option, it starts the processes. Afterwards, it justs sits there waiting for the SIGTERM signal from the docker stop command; once received, it shuts down all the running processes under its control and exits, which will also stop the container’s process. Additionaly, it can listen and react to some other signals, just like when it runs outside of a container.

Testing

Let’s test this approach with a container built using the above simple Dockerfile-dctm. Since the container is started in interactive mode, its output is visible on the screen and the commands to test it have to be sent from another terminal session; as before, for clarity, the commands have been inserted in the transcript as comments right before their result.

docker build -f Dockerfile-dctm --tag=dctm .
Sending build context to Docker daemon 6.656kB
Step 1/5 : FROM ubuntu:latest
---> 1d9c17228a9e
Step 2/5 : RUN apt-get update && apt-get install -y gawk
---> Using cache
---> f550d88161b6
Step 3/5 : COPY dctm.sh /root/.
---> e15e3f4ea93c
Step 4/5 : HEALTHCHECK --interval=5s --timeout=2s --retries=1 CMD grep -q OK /tmp/status || exit 1
---> Running in 0cea23cec09e
Removing intermediate container 0cea23cec09e
---> f9bf4138eb83
Step 5/5 : ENTRYPOINT ["/root/dctm.sh", "start"] ---> Running in 670c5231d5d8
Removing intermediate container 670c5231d5d8
---> 27991672905e
Successfully built 27991672905e
Successfully tagged dctm:latest
 
# docker run -i --name=dctm dctm
process started with pid 9 and random value 32760057
process started with pid 10 and random value 10364519
process started with pid 11 and random value 2915264
process started with pid 12 and random value 3744070
process started with pid 13 and random value 23787621
5 processes started
 
started processes:
1 9 1 1 ? -1 S 0 0:00 I am number 32760057
1 10 1 1 ? -1 S 0 0:00 I am number 10364519
1 11 1 1 ? -1 S 0 0:00 I am number 2915264
1 12 1 1 ? -1 S 0 0:00 I am number 3744070
1 13 1 1 ? -1 S 0 0:00 I am number 23787621
 
send signal SIGURG for help
send signal SIGPWR to start a few processes
send signal SIGUSR1 to start a new process
send signal SIGUSR2 for the list of started processes
send signal SIGINT | SIGABRT to stop all the processes
send signal SIGHUP | SIGQUIT | SIGTERM to shutdown the processes and exit the container
 
# docker kill --signal=SIGUSR2 dctm
status of running processes at 2019/04/06 14:56:14
showing 32760057
random value 32760057 is used by process with pid 9 and pgid 1
showing 10364519
random value 10364519 is used by process with pid 10 and pgid 1
showing 2915264
random value 2915264 is used by process with pid 11 and pgid 1
showing 3744070
random value 3744070 is used by process with pid 12 and pgid 1
showing 23787621
random value 23787621 is used by process with pid 13 and pgid 1
5 processes found
 
# docker kill --signal=SIGURG dctm
send signal SIGURG for help
send signal SIGPWR to start a few processes
send signal SIGUSR1 to start a new process
send signal SIGUSR2 for the list of started processes
send signal SIGINT | SIGABRT to stop all the processes
send signal SIGHUP | SIGQUIT | SIGTERM to shutdown the processes and exit the container
 
# docker kill --signal=SIGUSR1 dctm
starting a new process at 2019/04/06 14:57:30
process started with pid 14607 and random value 10066771
 
# docker kill --signal=SIGABRT dctm
shutting down the processes at 2019/04/06 14:58:12
stopping 32760057
stopping 10364519
stopping 2915264
stopping 3744070
stopping 23787621
stopping 10066771
6 processes stopped
 
# docker kill --signal=SIGUSR2 dctm
status of running processes at 2019/04/06 14:59:01
0 processes found
 
# docker kill --signal=SIGTERM dctm
shutting down the container at 2019/04/06 14:59:19
 
shutting down the processes at 2019/04/06 14:59:19
0 processes stopped
 
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

We observe exactly the same behavior as the stand-alone dctm.sh, that’s comforting.
Moreover, when the container is stopped, the signal is trapped correctly by the proxy:

...
random value 14725194 is used by process with pid 29 and pgid 1
showing 12554300
random value 12554300 is used by process with pid 30 and pgid 1
5 processes found
 
# date -u +"%Y/%m/%d %H:%M:%S"; docker stop dctm
# 2019/04/10 22:51:47
# dctm
shutting down the container at 2019/04/10 22:51:47
 
shutting down the processes at 2019/04/10 22:51:47
stopping 36164161
stopping 6693775
stopping 11404415
stopping 14725194
stopping 12554300
5 processes stopped

The good thing is that if the docker daemon is stopped at the host level, either interactively or at system shut down, the daemon first sends a SIGTERM to every running container:

date --utc +"%Y/%m/%d %H-%M-%S"; sudo systemctl stop docker
2019/04/06 15-02-18
[sudo] password for docker:

and on the other terminal:

shutting down the container at 2019/04/06 15:02:39
 
shutting down the processes at 2019/04/06 15:02:39
stopping 17422702
stopping 30251419
stopping 14451888
stopping 14890733
stopping 1105445
5 processes stopped

so each container can process the signal accordingly to its needs. Our future Documentum container is now ready for a clean shutdown.

Doing something useful instead of sitting idle: light monitoring

As said, the proxy script waits for a signal from within a loop; the action performed inside the loop is waiting for an input from stdin, which is not particularly useful. Why not taking advantage of this slot to make it do something useful like a monitoring of the running processes ? Such a function already exists in the script, it’s status_all(). Thus, let’s set this up:

# while true; do read; done
# do something useful instead;
while true; do
status_all
sleep 30
done

We quickly notice that their processing is not so briskly any more. In effect, bash waits before processing a signal until the command currently executing completes, here any command inside the loop, so a slight delay is perceptible before our signals get care of, especially if we are in the middle of a ‘sleep 600’ command. Moreover, incoming signals are not stacked up but they replace each one another until the most recent one only is processed. In practical conditions, this is not a problem for it is still possible to send signals and have them processed, just not in burst mode. If a better reactivity to signals is needed, the sleep duration should be shortened and/or a separate scheduling of the monitoring be introduced (started asynchronously from a loop in the entrypoint or from a crontab inside the container ?).
Note that the status send to stdout from within a detached container (i.e. started without the -i for interactive option, which is generally the case) is not visible outside a container. Fortunately, and even better, the docker logs command makes it possible to view on demand the status output:

docker logs --follow container_name

In our case:

docker logs --follow dctm
status of running processes at 2019/04/06 15:21:21
showing 8235843
random value 8235843 is used by process with pid 8 and pgid 1
showing 16052839
random value 16052839 is used by process with pid 9 and pgid 1
showing 1097668
random value 1097668 is used by process with pid 10 and pgid 1
showing 5113933
random value 5113933 is used by process with pid 11 and pgid 1
showing 1122110
random value 1122110 is used by process with pid 12 and pgid 1
5 processes found

Note too that the logs commands also has a timestamps option for prefixing the lines output with the time they were produced, as illustrated below:

docker logs --timestamps --since 2019-04-06T18:06:23 dctm
2019-04-06T18:06:23.607796640Z status of running processes at 2019/04/06 18:06:23
2019-04-06T18:06:23.613666475Z showing 7037074
2019-04-06T18:06:23.616334029Z random value 7037074 is used by process with pid 8 and pgid 1
2019-04-06T18:06:23.616355592Z showing 33446655
2019-04-06T18:06:23.623719975Z random value 33446655 is used by process with pid 9 and pgid 1
2019-04-06T18:06:23.623785755Z showing 17309380
2019-04-06T18:06:23.627050839Z random value 17309380 is used by process with pid 10 and pgid 1
2019-04-06T18:06:23.627094599Z showing 13859725
2019-04-06T18:06:23.630436025Z random value 13859725 is used by process with pid 11 and pgid 1
2019-04-06T18:06:23.630472176Z showing 26767323
2019-04-06T18:06:23.633304616Z random value 26767323 is used by process with pid 12 and pgid 1
2019-04-06T18:06:23.635900480Z 5 processes found
2019-04-06T18:06:26.640490424Z

This is handy, but still not perfect, for those cases where lazy programmers neglect to date their logs’ entries.
Now, since we have a light-weight monitoring in place, we can use it in the dockerfile’s HEALTHCHECK clause to show the container’s status through the ps command. As the processes’ status is already determined in the wait loop of the dctm.sh script, it is pointless to compute it again. Instead, we can modify status_all() to print the overall status in a file, say in /tmp/status, so that HEALTHCHECK can read it later every $INTERVAL period. If status_all() is invoked every $STATUS_PERIOD, a race condition can occur every LeastCommonMultiple($INTERVAL, $STATUS_PERIOD), i.e. when these 2 processes will access the file simultaneously, the former in reading mode and the latter in writing mode. To avoid this nasty situation, status_all() will first write into /tmp/tmp_status and later rename this file to /tmp/status. For the sake of our example, let’s decide that the container is unhealthy if there are no dummy processes running, and healthy if there is at least one running (in real conditions, the container would be healthy if ALL the processes are responding, and unhealthy if ANY of them is not but it also depends on the definition of health). Here is the new dctm.sh’s status_all() function:

status_all() {
   echo; echo "status of running processes at $(date +"%Y/%m/%d %H:%M:%S")"
   nb_processes=0
   for no in $(ps -ef | grep "I am number " | grep -v grep | gawk '{print $NF}'); do
      echo "showing $no"
      func status $no
      (( nb_processes++ ))
   done
   echo "$nb_processes processes found"
   if [[ $nb_processes -eq 0 ]]; then
      printf "status: bad\n" > /tmp/tmp_status
   else
      printf "status: OK\n" > /tmp/tmp_status
   fi
   mv /tmp/tmp_status /tmp/status
}

Here is the new dockerfile:

FROM ubuntu:latest
RUN apt-get update &&      \
    apt-get install -y gawk
COPY dctm.sh /root/.
HEALTHCHECK --interval=10s --timeout=2s --retries=2 CMD grep -q OK /tmp/status || exit 1
ENTRYPOINT ["/root/dctm.sh", "start"]

Here is what the ps commands shows now:

# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
64e22a8f75cd dctm "/root/dctm.sh start" 38 minutes ago Up 2 seconds (health: starting) dctm
 
...
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
64e22a8f75cd dctm "/root/dctm.sh start" 38 minutes ago Up 6 seconds (healthy) dctm

A new column, STATUS, is displayed showing the container’s current health status.
If a new built is unwanted, the clause can be specified when running the image:

docker run --name dctm --health-cmd "grep -q OK /tmp/status || exit 1" --health-interval=10s --health-timeout=2s --health-retries=1 dctm

Note how these parameters are now prefixed with “health-” so they can be related to the HEALTHCHECK clause.
Now, in order to observe how the status is updated, let’s play with the signals INT and PWR to respectively stop and launch processes inside the container:

# current situation:
docker logs dctm
status of running processes at 2019/04/12 14:05:00
showing 29040429
random value 29040429 is used by process with pid 1294 and pgid 1
showing 34302125
random value 34302125 is used by process with pid 1295 and pgid 1
showing 2979702
random value 2979702 is used by process with pid 1296 and pgid 1
showing 4661756
random value 4661756 is used by process with pid 1297 and pgid 1
showing 7169283
random value 7169283 is used by process with pid 1298 and pgid 1
5 processes found
 
# show status:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ff25beae71f0 dctm "/root/dctm.sh start" 55 minutes ago Up 55 minutes (healthy) dctm
 
# stop the processes:
docker kill --signal=SIGINT dctm
# wait up to the given health-interval and check again:
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ff25beae71f0 dctm "/root/dctm.sh start" 57 minutes ago Up 57 minutes (unhealthy) dctm
 
# restart the processes:
docker kill --signal=SIGPWR dctm
 
# wait up to the health-interval and check again:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ff25beae71f0 dctm "/root/dctm.sh start" About an hour ago Up About an hour (healthy) dctm

The healthstatus command works as expected.
Note that the above HEALTHCHECK successful tests were done under Centos Linux release 7.6.1810 (Core) with docker Client Version 1.13.1 and API version 1.26, Server Version 1.13.1 and API version 1.26 (minimum version 1.12).
The HEALTHCHECK clause looks broken under Ubuntu 18.04.1 LTS with docker Client Version 18.09.1 and API version 1.39, Server Engine – Community Engine Version 18.09.1 and API version 1.39 (minimum version 1.12). After a change of status, HEALTHCHECK sticks to the unhealthy state and “docker ps” always shows “healthy” no matter the following changes in the running processes inside the container. It looks like the monitoring cycles until an unhealthy condition occurs, then it stops cycling and stays in the unhealthy state, as it is also visible in the timestamps by inspecting the container’s status:

docker inspect --format='{{json .State.Health}}' dctm
{"Status":"unhealthy","FailingStreak":0,"Log":[{"Start":"2019-04-12T16:04:18.995957081+02:00","End":"2019-04-12T16:04:19.095540448+02:00","ExitCode":0,"Output":""},
{"Start":"2019-04-12T16:04:21.102151004+02:00","End":"2019-04-12T16:04:21.252025292+02:00","ExitCode":0,"Output":""},
{"Start":"2019-04-12T16:04:23.265929424+02:00","End":"2019-04-12T16:04:23.363387974+02:00","ExitCode":0,"Output":""},
{"Start":"2019-04-12T16:04:25.372757042+02:00","End":"2019-04-12T16:04:25.471229004+02:00","ExitCode":0,"Output":""},
{"Start":"2019-04-12T16:04:27.47692396+02:00","End":"2019-04-12T16:04:27.580458001+02:00","ExitCode":0,"Output":""}]}

The last 5 entries stop being updated.
While we are mentioning bugs, “docker logs –tail 0 dctm” under Centos displays the whole log available so far, so specify 1 at least to reduce the output of the log history to a minimum. Under Ubuntu, it works as expected though. However, the “–follow” option works under Centos but not under Ubuntu. So, there is some instability here; be prepared to comprehensively test every docker’s feature to be used.

Using docker’s built-in init process

As said above, docker does not have a full-fledged init process like systemd but still offers something vaguely related, tini, which stands for “tiny init”, see here. It wont’t solve the inability of Documentum’s processes to respond to signals and therefore the proxy script is still needed. However, in addition to forwarding signals to its child process, tini has the advantage of taking care of defunct processes, or zombies, by reaping them up regularly. Documentum produces a lot of them and they finish up disappearing in the long run. Still, tini could speeds this up a little bit.
tini can be invoked from the command-line as follows:

docker run -i --name=dctm --init dctm

But it is also possible to integrate it directly in the dockerfile so the −−init option won’t be needed any longer (and shouldn’t be used otherwise tini will not be PID 1 and its reaping feature won’t be possible anymore, making it useless for us):

FROM ubuntu:latest
COPY dctm.sh /root/.
ENV TINI_VERSION v0.18.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
RUN apt-get update &&      \
    apt-get install -y gawk &&      \
    chmod +x /tini
HEALTHCHECK --interval=10s --timeout=2s --retries=2 CMD grep -q OK /tmp/status || exit 1
ENTRYPOINT ["/tini", "--"]
# let tini launch the proxy;
CMD ["/root/dctm.sh", "start"]

Let’s build the image with tini:

docker build -f Dockerfile-dctm --tag=dctm:with-tini .
Sending build context to Docker daemon 6.656kB
Step 1/8 : FROM ubuntu:latest
---> 1d9c17228a9e
Step 2/8 : COPY dctm.sh /root/.
---> a724637581fe
Step 3/8 : ENV TINI_VERSION v0.18.0
---> Running in b7727fc065e9
Removing intermediate container b7727fc065e9
---> d1e1a17d7255
Step 4/8 : ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
Downloading [==================================================>] 24.06kB/24.06kB
---> 47b1fc9f82c7
Step 5/8 : RUN apt-get update && apt-get install -y gawk && chmod +x /tini
---> Running in 4543b6f627f3
Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB] Get:2 http://archive.ubuntu.com/ubuntu bionic InRelease [242 kB] ...
Step 6/8 : HEALTHCHECK --interval=5s --timeout=2s --retries=1 CMD grep -q OK /tmp/status || exit 1
---> Running in d2025cbde647
Removing intermediate container d2025cbde647
---> a17fd24c4819
Step 7/8 : ENTRYPOINT ["/tini", "--"] ---> Running in ee1e10062f22
Removing intermediate container ee1e10062f22
---> f343d21175d9
Step 8/8 : CMD ["/root/dctm.sh", "start"] ---> Running in 6d41f591e122
Removing intermediate container 6d41f591e122
---> 66541b8c7b37
Successfully built 66541b8c7b37
Successfully tagged dctm:with-tini

Let’s run the image:

docker run -i --name=dctm dctm:with-tini
 
starting a few processes at 2019/04/07 11:55:30
process started with pid 9 and random value 23970936
process started with pid 10 and random value 35538668
process started with pid 11 and random value 12039907
process started with pid 12 and random value 21444522
process started with pid 13 and random value 7681454
5 processes started
...

And let’s see how the container’s processes look like with tini from another terminal:

docker exec -it dctm /bin/bash
ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 11:55 ? 00:00:00 /tini -- /root/dctm.sh start
root 6 1 0 11:55 ? 00:00:00 /bin/bash /root/dctm.sh start
root 9 6 0 11:55 ? 00:00:00 I am number 23970936
root 10 6 0 11:55 ? 00:00:00 I am number 35538668
root 11 6 0 11:55 ? 00:00:00 I am number 12039907
root 12 6 0 11:55 ? 00:00:00 I am number 21444522
root 13 6 0 11:55 ? 00:00:00 I am number 7681454
root 174 0 0 11:55 ? 00:00:00 /bin/bash
root 201 6 0 11:55 ? 00:00:00 sleep 3
root 208 174 0 11:55 ? 00:00:00 ps -ef
...

So tini is really running with PID == 1 and has started the proxy as its child process as expected.
Let’s test the container by sending a few signals:

docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5f745485a907 dctm:with-tini "/tini -- /root/dctm…" 8 seconds ago Up 7 seconds (healthy) dctm
 
# docker kill --signal=SIGINT dctm
shutting down the processes at 2019/04/07 11:59:42
stopping 23970936
stopping 35538668
stopping 12039907
stopping 21444522
stopping 7681454
5 processes stopped
 
status of running processes at 2019/04/07 11:59:42
0 processes found
 
status of running processes at 2019/04/07 11:59:45
0 processes found
 
# docker kill --signal=SIGTERM dctm
shutting down the processes at 2019/04/07 12:00:00
0 processes stopped

and then the container gets stopped. So, the signals are well transmitted to tini’s child process.
If one prefers to use the run’s −−init option instead of modifying the dockerfile and introduce tini as the ENTRYPOINT, it is even better because we will have only one version of the dockerfile to maintain. Here is the invocation and how the processes will look like:

docker run --name=dctm --init dctm
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1d9fa0d98817 dctm "/root/dctm.sh start" 4 seconds ago Up 3 seconds (health: starting) dctm
docker exec dctm /bin/bash -c "ps -ef"
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 12:11 ? 00:00:00 /dev/init -- /root/dctm.sh start
root 6 1 0 12:11 ? 00:00:00 /bin/bash /root/dctm.sh start
root 9 6 0 12:11 ? 00:00:00 I am number 23850948
root 10 6 0 12:11 ? 00:00:00 I am number 19493606
root 11 6 0 12:11 ? 00:00:00 I am number 34535435
root 12 6 0 12:11 ? 00:00:00 I am number 32571187
root 13 6 0 12:11 ? 00:00:00 I am number 35596440
root 116 0 1 12:11 ? 00:00:00 /bin/bash
root 143 6 0 12:11 ? 00:00:00 sleep 3
root 144 116 0 12:11 ? 00:00:00 ps -ef

It looks even better; tini is still there – presumably – but hidden behind /dev/init so the container will be immune to any future change in the default init process.

Adapting dctm.sh for Documentum

Adapting the proxy script to a real Documentum installation with its own central stop/start/status script, let’s name it dctm_stop_start.sh, is easy. The main changes are limited to the func() function; now, it just relays the commands to the script dctm_stop_start.sh:

#!/bin/bash
# launches in the background, or stops or queries the status of, the Documentum dctm_start_stop.sh script;
# Usage:
#   ./dctm.sh stop | start | status
# e.g.:
# $ ./dctm.sh start
# cec - dbi-services - April 2019
#
trap 'help'         SIGURG
trap 'start_all'    SIGPWR
trap 'status_all'   SIGUSR2
trap 'stop_all'     SIGINT SIGABRT
trap 'shutdown_all' SIGHUP SIGQUIT SIGTERM

verb="sleep"
export id_prefix="I am number"

func() {
   cmd="$1"
   case $cmd in
      start)
         ./dctm_start_stop.sh start &
         ;;
      stop)
         ./dctm_start_stop.sh stop &
         ;;
      status)
         ./dctm_start_stop.sh status
         return $?
         ;;
   esac
}

help() {
   echo
   echo "send signal SIGURG for help"
   echo "send signal SIGPWR to start the Documentum processes"
   echo "send signal SIGUSR2 for the list of Documentum started processes"
   echo "send signal SIGINT | SIGABRT to stop all the processes"
   echo "send signal SIGHUP | SIGQUIT | SIGTERM to shutdown the Documentum processes and exit the container"
}

start_all() {
   echo; echo "starting the Documentum processes at $(date +"%Y/%m/%d %H:%M:%S")"
   func start
}

status_all() {
   echo; echo "status of Documentum processes at $(date +"%Y/%m/%d %H:%M:%S")"
   func status
   if [[ $? -eq 0 ]]; then
      printf "status: bad\n" > /tmp/tmp_status
   else
      printf "status: OK\n" > /tmp/tmp_status
   fi
   mv /tmp/tmp_status /tmp/status
}

stop_all() {
   echo; echo "shutting down the Documentum processes at $(date +"%Y/%m/%d %H:%M:%S")"
   func stop
}

shutdown_all() {
   echo; echo "shutting down the container at $(date +"%Y/%m/%d %H:%M:%S")"
   stop_all
   exit 0
}

# -----------
# main;
# -----------

# starts a few dummy processes;
[[ "$1" = "start" ]] && start_all

# make sure the container stays up and waits for signals;
while true; do status_all; sleep 3; done

Here is a skeleton of the script dctm_start_stop.sh:

#!/bin/bash
   cmd="$1"
   case $cmd in
      start)
         # insert here your Documentum installation's start scripts, e.g.
         # /app/dctm/dba/dm_launch_Docbroker
         # /app/dctm/dba/dm_start_testdb
         # /app/dctm/shared/wildfly9.0.1/server/startMethodServer.sh &
         echo "started"
         ;;
      stop)
         # insert here your Documentum installation's stop scripts, e.g.
         # /app/dctm/shared/wildfly9.0.1/server/stopMethodServer.sh
         # /app/dctm/dba/dm_shutdown_testdb
         # /app/dctm/dba/dm_stop_Docbroker
         echo "stopped"
         ;;
      status)
         # insert here your statements to test the Documentum processes's health;
         # e.g. dmqdocbroker -c -p 1489 ....
         # e.g. idql testdb -Udmadmin -Pxxx to try to connect to the docbase;
         # e.g. wget http://localhost:9080/... to test the method server;
         # 0: OK, 1: NOK;
         exit 0
         ;;
   esac

Let’s introduce a slight modification in the dockerfile’s entrypoint clause: instead of having the Documentum processes start at container startup, the container will start with only the proxy running inside. Only upon receiving the signal SIGPWR will the proxy start all the Documentum processes:

ENTRYPOINT ["/root/dctm.sh", ""]

If the light-weight monitoring is in action, the container will be flagged unhealthy but this can be an useful reminder.
Note that the monitoring could be activated or deactivated through a signal as showed in the diff output below:

diff dctm-no-monitoring.sh dctm.sh
21a22
> trap 'status_on_off' SIGCONT
24a26
> bStatus=1
119a122,125
> status_on_off() {
>    (( bStatus = (bStatus + 1) % 2 ))
> }
> 
132c138
    [[ $bStatus -eq 1 ]] && status_all

This is more flexible and better matches the reality.

Shortening docker commands

We have thrown a lot of docker commands at you. If they are used often, their verbosity can be alleviated through aliases, e.g.:

alias di='docker images'
alias dpsas='docker ps -as'
alias dps='docker ps'
alias dstatus='docker kill --signal=SIGUSR2'
alias dterm='docker kill --signal=SIGTERM'
alias dabort='docker kill --signal=SIGABRT'
alias dlogs='docker logs --follow'
alias dstart='docker start'
alias dstarti='docker start -i'
alias dstop="docker stop"
alias drm='docker container rm'

or even bash functions for the most complicated ones (to be appended into e.g. your ~/.bashrc):

function drun {
image="$1"
docker run -i --name=$image $image
}
 
function dkill {
signal=$1
container=$2
docker kill --signal=$signal $container
}
 
function dbuild {
docker build -f Dockerfile-dctm --tag=$1 .
}

The typical sequence for testing the dockerfile Dockerfile-dctm to produce the image dctm and run it as the dctm container is:

dbuild dctm
drm dctm
drun dctm

Much less typing.

Conclusion

At the end of the day, it is no such a big deal that the Documentum CS does not process signals sent to it for it is easy to work around this omission and even go beyond the basic stops and starts. As always, missing features or shortcomings become a source of inspiration and enhancements !
Containerization has lots of advantages but we have noticed that docker’s implementations vary between versions and platforms so some features don’t always work as expected, if at all.
In a future blog, I’ll show a containerization of the out of the box CS that includes signal trapping. In the meantime, live long and don’t despair.

Cet article How to stop Documentum processes in a docker container, and more (part II) est apparu en premier sur Blog dbi services.

PostgreSQL 12: Explain will display custom settings, if instructed

Wed, 2019-04-10 13:35

How many times did you try to solve a performance issue but have not been able to reproduce the explain plan? Whatever you tried you always got a different result. Lets say you managed to get a dump of the database in question, got all the PostreSQL parameters the same, gathered statistics but still you do not manage to get the same plan as the one who reported the issue. What could be a potential issue here? Lets do a short demo:

Imagine someone is sending you this plan for a simple count(*) against pg_class:

postgres=# explain (analyze) select count(*) from pg_class;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=23.10..23.11 rows=1 width=8) (actual time=0.293..0.293 rows=1 loops=1)
   ->  Index Only Scan using pg_class_oid_index on pg_class  (cost=0.27..22.12 rows=390 width=0) (actual time=0.103..0.214 rows=390 loops=1)
         Heap Fetches: 0
 Planning Time: 0.155 ms
 Execution Time: 0.384 ms
(5 rows)

When you try the same on your environment the plan always looks like this (sequential scan, but not an index only scan):

postgres=# explain (analyze) select count(*) from pg_class;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=17.88..17.89 rows=1 width=8) (actual time=0.322..0.323 rows=1 loops=1)
   ->  Seq Scan on pg_class  (cost=0.00..16.90 rows=390 width=0) (actual time=0.017..0.220 rows=390 loops=1)
 Planning Time: 1.623 ms
 Execution Time: 0.688 ms
(4 rows)

In this case the index only scan is even faster, but usually you get a sequential scan because costs are lower. Whatever you try, you can not reproduce it. What you can’t know: The person reporting the issue didn’t tell you about that:

postgres=# set enable_seqscan = off;
SET
postgres=# explain (analyze) select count(*) from pg_class;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=23.10..23.11 rows=1 width=8) (actual time=0.230..0.230 rows=1 loops=1)
   ->  Index Only Scan using pg_class_oid_index on pg_class  (cost=0.27..22.12 rows=390 width=0) (actual time=0.032..0.147 rows=390 loops=1)
         Heap Fetches: 0
 Planning Time: 0.130 ms
 Execution Time: 0.281 ms

Just before executing the statement a parameter has been changed which influences PostgreSQL’s choise about the best plan. And this is where the new feature of PostgreSQL 12 becomes handy:

postgres=# explain (analyze,settings) select count(*) from pg_class;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=23.10..23.11 rows=1 width=8) (actual time=0.309..0.310 rows=1 loops=1)
   ->  Index Only Scan using pg_class_oid_index on pg_class  (cost=0.27..22.12 rows=390 width=0) (actual time=0.045..0.202 rows=390 loops=1)
         Heap Fetches: 0
 Settings: enable_seqscan = 'off'
 Planning Time: 0.198 ms
 Execution Time: 0.395 ms
(6 rows)

postgres=# 

From PostgreSQL 12 on you can ask explain to display any setting that has been changed and influenced the decision on which plan to choose. This might be optimizer parameters as here, but this might also be others when they differ from the global setting:

postgres=# explain (analyze,settings) select count(*) from pg_class;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=17.88..17.89 rows=1 width=8) (actual time=0.197..0.198 rows=1 loops=1)
   ->  Seq Scan on pg_class  (cost=0.00..16.90 rows=390 width=0) (actual time=0.016..0.121 rows=390 loops=1)
 Settings: work_mem = '64MB'
Planning Time: 0.162 ms
 Execution Time: 0.418 ms
(5 rows)

… or:

postgres=# set from_collapse_limit = 13;
SET
postgres=# explain (analyze,settings) select count(*) from pg_class;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=17.88..17.89 rows=1 width=8) (actual time=0.190..0.190 rows=1 loops=1)
   ->  Seq Scan on pg_class  (cost=0.00..16.90 rows=390 width=0) (actual time=0.012..0.115 rows=390 loops=1)
 Settings: from_collapse_limit = '13', work_mem = '64MB'
 Planning Time: 0.185 ms
 Execution Time: 0.263 ms
(5 rows)

Nice addition. Asking people to use the “settings” switch with analyze, you can be sure on what was changed from the global settings so it is much easier to reproduce the issue and to see what’s going on.

Parameters that do not influence the plan do not pop up:

postgres=# set log_statement='all';
SET
postgres=# explain (analyze,settings) select count(*) from pg_class;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=17.88..17.89 rows=1 width=8) (actual time=0.199..0.200 rows=1 loops=1)
   ->  Seq Scan on pg_class  (cost=0.00..16.90 rows=390 width=0) (actual time=0.018..0.124 rows=390 loops=1)
 Settings: from_collapse_limit = '13', work_mem = '64MB'
 Planning Time: 0.161 ms
 Execution Time: 0.391 ms
(5 rows)

Cet article PostgreSQL 12: Explain will display custom settings, if instructed est apparu en premier sur Blog dbi services.

Pages