Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 1 week 5 days ago

Windows Docker containers, when platform matters

Wed, 2019-07-31 07:58

A couple of days ago, I got a question from a customer about an issue he ran into when trying to spin up a container on Windows.

The context was as follows:

> docker container run hello-world:nanoserver
Unable to find image 'hello-world:nanoserver' locally
nanoserver: Pulling from library/hello-world
C:\Program Files\Docker\docker.exe: no matching manifest for windows/amd64 10.0.14393 in the manifest list entries.
See 'C:\Program Files\Docker\docker.exe run --help'.

 

I thought that was very interesting because it pointed out some considerations about Docker image architecture design. First, we must bear in mind that containers and the underlying host share a single kernel by design and the container’s base image must match that of the host.

Let’s first begin with containers in a Linux world because it highlights the concept of Kernel sharing between different distros. In this demo, let’s say I’m running a Linux Ubuntu server 16.04 …

$ cat /etc/os-release | grep -i version
VERSION="16.04.6 LTS (Xenial Xerus)"
VERSION_ID="16.04"
VERSION_CODENAME=xenial

 

… and let’s say I want to run a container based on Centos 6.6 …

$ docker run --rm -ti centos:6.6 cat /etc/centos-release
Unable to find image 'centos:6.6' locally
6.6: Pulling from library/centos
5dd797628260: Pull complete
Digest: sha256:32b80b90ba17ed16e9fa3430a49f53ff6de0d4c76ad8631717a1373d5921fa26
Status: Downloaded newer image for centos:6.6
CentOS release 6.6 (Final)

 

You may wonder how it is possible to run different distros between the container and the host and what’s the magic behind the scene? In fact, both the container and the host share the same Linux kernel and even if CentOS 6.6 ships with a kernel version 2.6, while Ubuntu 16.04 ships with 4.4 we usually may upgrade the kernel since it’s backward compatible. The commands below demonstrate the centos container is using the same Kernel than the host.

$ uname -r
4.4.0-142-generic
$ docker run --rm -ti centos:6.6 uname -r
4.4.0-142-generic

 

Let’s say now my docker host is running on the x64 architecture. If we look at the Centos image supported architectures on Docker hub, we notice different ones:

From the output above, we may deduce it should exist a combination of different images and tags for each available architecture and the interesting point is how does Docker pull the correct one regarding my underlying architecture? This is where manifest lists come into play and allow multi-architecture images. A manifest list contains platform segregated references to a single-platform manifest entry. We may inspect a manifest list through the docker manifest command (still in experimental mode at the moment of writing this blog post).

For example, if I want to get a list of manifests and their corresponding architectures for the Centos 7, I can run docker manifest command as follows:

$ docker manifest inspect centos:7 --verbose
[
        {
                "Ref": "docker.io/library/centos:7@sha256:ca58fe458b8d94bc6e3072f1cfbd334855858e05e1fd633aa07cf7f82b048e66",
                "Descriptor": {
                        "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
                        "digest": "sha256:ca58fe458b8d94bc6e3072f1cfbd334855858e05e1fd633aa07cf7f82b048e66",
                        "size": 529,
                        "platform": {
                                "architecture": "amd64",
                                "os": "linux"
                        }
                },
                "SchemaV2Manifest": {
                        "schemaVersion": 2,
                        "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
                        "config": {
                                "mediaType": "application/vnd.docker.container.image.v1+json",
                                "size": 2182,
                                "digest": "sha256:9f38484d220fa527b1fb19747638497179500a1bed8bf0498eb788229229e6e1"
                        },
                        "layers": [
                                {
                                        "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
                                        "size": 75403831,
                                        "digest": "sha256:8ba884070f611d31cb2c42eddb691319dc9facf5e0ec67672fcfa135181ab3df"
                                }
                        ]
                }
        },
        {
                "Ref": "docker.io/library/centos:7@sha256:9fd67116449f225c6ef60d769b5219cf3daa831c5a0a6389bbdd7c952b7b352d",
                "Descriptor": {
                        "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
                        "digest": "sha256:9fd67116449f225c6ef60d769b5219cf3daa831c5a0a6389bbdd7c952b7b352d",
                        "size": 529,
                        "platform": {
                                "architecture": "arm",
                                "os": "linux",
                                "variant": "v7"
                        }
                },
                "SchemaV2Manifest": {
                        "schemaVersion": 2,
                        "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
                        "config": {
                                "mediaType": "application/vnd.docker.container.image.v1+json",
                                "size": 2181,
                                "digest": "sha256:8c52f2d0416faa8009082cf3ebdea85b3bc1314d97925342be83bc9169178efe"
                        },
                        "layers": [
                                {
                                        "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
                                        "size": 70029389,
                                        "digest": "sha256:193bcbf05ff9ae85ac1a58cacd9c07f8f4297dc648808c347cceb3797ae603af"
                                }
                        ]
                }
        },
        {
                "Ref": "docker.io/library/centos:7@sha256:f25f24daae92b5b5fe75bc0d5d9a3d2145906290f25aa434c43bfcefecd10dec",
                "Descriptor": {
                        "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
                        "digest": "sha256:f25f24daae92b5b5fe75bc0d5d9a3d2145906290f25aa434c43bfcefecd10dec",
                        "size": 529,
                        "platform": {
                                "architecture": "arm64",
                                "os": "linux",
                                "variant": "v8"
                        }
                },
                "SchemaV2Manifest": {
                        "schemaVersion": 2,
                        "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
                        "config": {
                                "mediaType": "application/vnd.docker.container.image.v1+json",
                                "size": 2183,
                                "digest": "sha256:7a51de8a65d533b6706fbd63beea13610e5486e49141610e553a3e784c133a37"
                        },
                        "layers": [
                                {
                                        "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
                                        "size": 74163767,
                                        "digest": "sha256:90c48ff53512085fb5adaf9bff8f1999a39ce5e5b897f5dfe333555eb27547a7"
                                }
                        ]
                }
        },
        {
                "Ref": "docker.io/library/centos:7@sha256:1f832b4e3b9ddf67fd77831cdfb591ce5e968548a01581672e5f6b32ce1212fe",
                "Descriptor": {
                        "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
                        "digest": "sha256:1f832b4e3b9ddf67fd77831cdfb591ce5e968548a01581672e5f6b32ce1212fe",
                        "size": 529,
                        "platform": {
                                "architecture": "386",
                                "os": "linux"
                        }
                },
                "SchemaV2Manifest": {
                        "schemaVersion": 2,
                        "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
                        "config": {
                                "mediaType": "application/vnd.docker.container.image.v1+json",
                                "size": 2337,
                                "digest": "sha256:fe70670fcbec5e3b3081c6800cb531002474c36563689b450d678a34a89b62c3"
                        },
                        "layers": [
                                {
                                        "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
                                        "size": 75654099,
                                        "digest": "sha256:39016a8400a36ce04799adba71f8678ae257d9d8dba638d81b8c5755f01fe213"
                                }
                        ]
                }
        },
        {
                "Ref": "docker.io/library/centos:7@sha256:2d9b27e9c89d511a58873254d86ecf96df0f599daae3d555d896fee9f49fedf4",
                "Descriptor": {
                        "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
                        "digest": "sha256:2d9b27e9c89d511a58873254d86ecf96df0f599daae3d555d896fee9f49fedf4",
                        "size": 529,
                        "platform": {
                                "architecture": "ppc64le",
                                "os": "linux"
                        }
                },
                "SchemaV2Manifest": {
                        "schemaVersion": 2,
                        "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
                        "config": {
                                "mediaType": "application/vnd.docker.container.image.v1+json",
                                "size": 2185,
                                "digest": "sha256:c9744f4afb966c58d227eb6ba03ab9885925f9e3314edd01d0e75481bf1c937d"
                        },
                        "layers": [
                                {
                                        "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
                                        "size": 76787221,
                                        "digest": "sha256:deab1c539926c1ca990d5d025c6b37c649bbba025883d4b209e3b52b8fdf514a"
                                }
                        ]
                }
        }
]

 

Each manifest entry contains different information including the image signature digest, the operating system and the supported architecture. Let’s pull the Centos:7 image:

$ docker pull centos:7
7: Pulling from library/centos
8ba884070f61: Pull complete
Digest: sha256:a799dd8a2ded4a83484bbae769d97655392b3f86533ceb7dd96bbac929809f3c
Status: Downloaded newer image for centos:7
docker.io/library/centos:7

 

Let’s have a look at the unique identifier of the centos:7 image:

$ docker inspect --format='{{.Id}}' centos:7sha256:9f38484d220fa527b1fb19747638497179500a1bed8bf0498eb788229229e6e1

 

It corresponds to the SchemaV2Manifest digest value of the manifest entry related to the x64 architecture (please refer to the docker manifest inspect output above). Another official way to query manifest list and architecture is to go through the mplatform/mquery container as follows:

$ docker run mplatform/mquery centos:7
Image: centos:7
 * Manifest List: Yes
 * Supported platforms:
   - linux/amd64
   - linux/arm/v7
   - linux/arm64
   - linux/386
   - linux/ppc64le

 

However, for a Linux Centos 6.6 image (used in my first demo) the architecture support seems to be limited to  the x64 architecture:

$ docker run mplatform/mquery centos:6.6
Image: centos:6.6
 * Manifest List: Yes
 * Supported platforms:
   - linux/amd64

 

Now we are aware of manifest lists and multi-architecture images let’s go back to the initial problem. The customer ran into an platform compatibility issue when trying to spin-up a the hello-world:nanoserver container on a Windows Server 2016 Docker host. As a reminder, the error message was:

no matching manifest for windows/amd64 10.0.14393 in the manifest list entries.

In the way, that may be surprising because Windows host and containers also share a single Kernel. That’s true and it was the root cause of my customer’s issue by the way. The image he wanted to pull supports only the following Windows architecture (queried from the manifest list):

> docker run mplatform/mquery hello-world:nanoserver
Image: hello-world:nanoserver
 * Manifest List: Yes
 * Supported platforms:
   - windows/amd64:10.0.17134.885
   - windows/amd64:10.0.17763.615

 

You may notice several supported Windows platforms but with different operating system versions. Let’s have look at the Docker host version in the context of my customer:

> [System.Environment]::OSVersion.Version
Major  Minor  Build  Revision
-----  -----  -----  --------
10     0      14393  0

 

The tricky part is Windows Server 2016 comes with different branches – 1607/1709 and 1803 – which aren’t technically all the same Windows Server version. Each branch comes with a different build number. Referring to the Microsoft documentation when the build number (3rd column) is changing a new operating system version is published. What it means in that case is the OS version between the Windows Docker host and the Docker image we tried to pull are different hence we experienced this compatibility issue. However let’s precise that images and containers may run with newer versions on the host side but the opposite is not true obviously. You can refer to the same Microsoft link to get a picture of Windows container and host compatibility. 

How to fix this issue? Well, we may go two ways here. The first one consists in re-installing a Docker host platform compatible with the corresponding image. The second one consists in using an image compatible with the current architecture and referring to the hello-world image tags we have one. We may check the architecture compatibility by query the manifest file list as follows:

> docker run mplatform/mquery hello-world:nanoserver-sac2016
Image: hello-world:nanoserver-sac2016
 * Manifest List: Yes
 * Supported platforms:
   - windows/amd64:10.0.14393.2551

 

Let’s try to pull the image with the nanoserver-sac2016 tag:

> docker pull hello-world:nanoserver-sac2016
nanoserver-sac2016: Pulling from library/hello-world
bce2fbc256ea: Already exists
6f2071dcd729: Pull complete
909cdbafc9e1: Pull complete
a43e426cc5c9: Pull complete
Digest: sha256:878fd913010d26613319ec7cc83b400cb92113c314da324681d9fecfb5082edc
Status: Downloaded newer image for hello-world:nanoserver-sac2016
docker.io/library/hello-world:nanoserver-sac2016

 

Here we go!

See you!

 

 

 

 

 

 

Cet article Windows Docker containers, when platform matters est apparu en premier sur Blog dbi services.

Alfresco Clustering – Share

Wed, 2019-07-31 01:00

In previous blogs, I talked about some basis and presented some possible architectures for Alfresco and I talked about the Clustering setup for the Alfresco Repository. In this one, I will work on the Alfresco Share layer. Therefore, if you are using another client like a CMIS/REST client or an ADF Application, it won’t work that way, but you might or might not need Clustering at that layer, it depends how the Application is working.

The Alfresco Share Clustering is used only for the caches, so you could technically have multiple Share nodes working with a single Repository or a Repository Cluster without the Share Clustering. For that, you could disable the caches on the Share layer because if you kept it enabled, you would have, eventually, faced issues. Alfresco introduced a Share Clustering which is used to keep the caches in sync, so you don’t have to disable it anymore. When needed, cache invalidation messages are sent from one Share node to all others, that include runtime application properties changes as well as new/existing site/user dashboards changes.

Just like for the Repository part, it’s really easy to setup the Share Clustering so there is really no reasons not to. It’s also using Hazelcast but it’s not based on properties that you need to configure in the alfresco-global.properties (because it’s a Share configuration), this one must be done in an XML file and there is no possibilities to do that in the Alfresco Admin Console, obviously.

All Share configuration/customization are put in the “$CATALINA_HOME/shared/classes/alfresco/web-extension” folder, this one is no exception. There are two possibilities for the Share Clustering communications:

  • Multicast
  • Unicast (TCP-IP in Hazelcast)

 

I. Multicast

If you do not know how many nodes will participate in your Share Cluster or if you want to be able to add more nodes in the future without having to change the previous nodes’ configuration, then you probably want to check and opt for the Multicast option. Just create a new file “$CATALINA_HOME/shared/classes/alfresco/web-extension/custom-slingshot-application-context.xml” and put this content inside it:

[alfresco@share_n1 ~]$ cat $CATALINA_HOME/shared/classes/alfresco/web-extension/custom-slingshot-application-context.xml
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:hz="http://www.hazelcast.com/schema/spring"
       xsi:schemaLocation="http://www.springframework.org/schema/beans
                           http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
                           http://www.hazelcast.com/schema/spring
                           http://www.hazelcast.com/schema/spring/hazelcast-spring-2.4.xsd">

  <hz:topic id="topic" instance-ref="webframework.cluster.slingshot" name="share_hz_test"/>
  <hz:hazelcast id="webframework.cluster.slingshot">
    <hz:config>
      <hz:group name="slingshot" password="Sh4r3_hz_Test_pwd"/>
      <hz:network port="5801" port-auto-increment="false">
        <hz:join>
          <hz:multicast enabled="true" multicast-group="224.2.2.5" multicast-port="54327"/>
          <hz:tcp-ip enabled="false">
            <hz:members></hz:members>
          </hz:tcp-ip>
        </hz:join>
        <hz:interfaces enabled="false">
          <hz:interface></hz:interface>
        </hz:interfaces>
      </hz:network>
    </hz:config>
  </hz:hazelcast>

  <bean id="webframework.cluster.clusterservice" class="org.alfresco.web.site.ClusterTopicService" init-method="init">
    <property name="hazelcastInstance" ref="webframework.cluster.slingshot" />
    <property name="hazelcastTopicName">
      <value>share_hz_test</value>
    </property>
  </bean>

</beans>
[alfresco@share_n1 ~]$

 

In the above configuration, be sure to set a topic name (matching the hazelcastTopicName’s value) as well as a group password that is specific to this environment, so you don’t end-up with a single Cluster with members coming from different environments. For the Share layer, it’s less of an issue than for the Repository layer but still. Be sure also to use a network port that isn’t in use, it will be the port that Hazelcast will bind itself to in the local host. For Alfresco Clustering, we used 5701 so here it’s 5801 for example.

Not much more to say about this configuration, we just enabled the multicast with an IP and a port to be used and we disabled the tcp-ip one.

The interfaces is disabled by default but you can enable it, if you want to. If it’s disabled, Hazelcast will list all local interfaces (127.0.0.1, local_IP1, local_IP2, …) and it will choose one in this list. If you want to force Hazelcast to use a specific local network interface, then enable this section and add that here. In can use the following nomenclature (IP only!):

  • 10.10.10.10: Hazelcast will try to bind on 10.10.10.10 only. If it’s not available, then it won’t start
  • 10.10.10.10-11: Hazelcast will try to bind on any IP within the range 10-11 so in this case 2 IPs: 10.10.10.10 or 10.10.10.11. If you have, let’s say, 5 IPs assigned to the local host and you don’t want Hazelcast to use 3 of these, then specify the ones that it can use and it will pick one from the list. This can also be used to have the same content for the custom-slingshot-application-context.xml on different hosts… One server with IP 10.10.10.10 and a second one with IP 10.10.10.11
  • 10.10.10.* or 10.10.*.*: Hazelcast will try to bind on any IP in this range, this is an extended version of the XX-YY range above

 

For most cases, keeping the interfaces disabled is sufficient since it will just pick one available. You might think that Hazelcast may bind itself to 127.0.0.1, technically it’s possible since it’s a local network interface but I have never seen it do so, so I assume that there is some kind of preferred order if another IP is available.

Membership in Hazelcast is based on “age”, meaning that the oldest member will be the one to lead. There is no predefined Master or Slave members, they are all equal, but the oldest/first member is the one that will check if new members are allowed to join (correct config) and if so, it will send the information to all other members that joined already so they are all aligned. If multicast is enabled, a multicast listener is started to listen for new membership requests.

 

II. Unicast

If you already know how many nodes will participate in your Share Cluster or if you prefer to avoid Multicast messages (there is no real need to overload your network with such things…), then it’s preferable to use Unicast messaging. For that purpose, just create the same file as above (“$CATALINA_HOME/shared/classes/alfresco/web-extension/custom-slingshot-application-context.xml“) but instead, use the tcp-ip section:

[alfresco@share_n1 ~]$ cat $CATALINA_HOME/shared/classes/alfresco/web-extension/custom-slingshot-application-context.xml
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:hz="http://www.hazelcast.com/schema/spring"
       xsi:schemaLocation="http://www.springframework.org/schema/beans
                           http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
                           http://www.hazelcast.com/schema/spring
                           http://www.hazelcast.com/schema/spring/hazelcast-spring-2.4.xsd">

  <hz:topic id="topic" instance-ref="webframework.cluster.slingshot" name="share_hz_test"/>
  <hz:hazelcast id="webframework.cluster.slingshot">
    <hz:config>
      <hz:group name="slingshot" password="Sh4r3_hz_Test_pwd"/>
      <hz:network port="5801" port-auto-increment="false">
        <hz:join>
          <hz:multicast enabled="false" multicast-group="224.2.2.5" multicast-port="54327"/>
          <hz:tcp-ip enabled="true">
            <hz:members>share_n1.domain,share_n2.domain</hz:members>
          </hz:tcp-ip>
        </hz:join>
        <hz:interfaces enabled="false">
          <hz:interface></hz:interface>
        </hz:interfaces>
      </hz:network>
    </hz:config>
  </hz:hazelcast>

  <bean id="webframework.cluster.clusterservice" class="org.alfresco.web.site.ClusterTopicService" init-method="init">
    <property name="hazelcastInstance" ref="webframework.cluster.slingshot" />
    <property name="hazelcastTopicName">
      <value>share_hz_test</value>
    </property>
  </bean>

</beans>
[alfresco@share_n1 ~]$

 

The description is basically the same as for the Multicast part. The main difference is that the multicast was disabled, the tcp-ip was enabled and there is therefore a list of members that needs to be set. This is a comma separated list of hostname or IPs that the Hazelcast will try to contact when it starts. Membership in case of Unicast is managed in the same way except that the oldest/first member will listen for new membership requests on the TCP-IP. Therefore, it’s the same principle, it’s just done differently.

Starting the first Share node in the Cluster will display the following information on the logs:

Jul 28, 2019 11:45:35 AM com.hazelcast.impl.AddressPicker
INFO: Resolving domain name 'share_n1.domain' to address(es): [127.0.0.1, 10.10.10.10]
Jul 28, 2019 11:45:35 AM com.hazelcast.impl.AddressPicker
INFO: Resolving domain name 'share_n2.domain' to address(es): [10.10.10.11]
Jul 28, 2019 11:45:35 AM com.hazelcast.impl.AddressPicker
INFO: Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [share_n1.domain/10.10.10.10, share_n2.domain/10.10.10.11, share_n1.domain/127.0.0.1]
Jul 28, 2019 11:45:35 AM com.hazelcast.impl.AddressPicker
INFO: Prefer IPv4 stack is true.
Jul 28, 2019 11:45:35 AM com.hazelcast.impl.AddressPicker
INFO: Picked Address[share_n1.domain]:5801, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5801], bind any local is true
Jul 28, 2019 11:45:36 AM com.hazelcast.system
INFO: [share_n1.domain]:5801 [slingshot] Hazelcast Community Edition 2.4 (20121017) starting at Address[share_n1.domain]:5801
Jul 28, 2019 11:45:36 AM com.hazelcast.system
INFO: [share_n1.domain]:5801 [slingshot] Copyright (C) 2008-2012 Hazelcast.com
Jul 28, 2019 11:45:36 AM com.hazelcast.impl.LifecycleServiceImpl
INFO: [share_n1.domain]:5801 [slingshot] Address[share_n1.domain]:5801 is STARTING
Jul 28, 2019 11:45:36 AM com.hazelcast.impl.TcpIpJoiner
INFO: [share_n1.domain]:5801 [slingshot] Connecting to possible member: Address[share_n2.domain]:5801
Jul 28, 2019 11:45:36 AM com.hazelcast.nio.SocketConnector
INFO: [share_n1.domain]:5801 [slingshot] Could not connect to: share_n2.domain/10.10.10.11:5801. Reason: ConnectException[Connection refused]
Jul 28, 2019 11:45:37 AM com.hazelcast.nio.SocketConnector
INFO: [share_n1.domain]:5801 [slingshot] Could not connect to: share_n2.domain/10.10.10.11:5801. Reason: ConnectException[Connection refused]
Jul 28, 2019 11:45:37 AM com.hazelcast.impl.TcpIpJoiner
INFO: [share_n1.domain]:5801 [slingshot]

Members [1] {
        Member [share_n1.domain]:5801 this
}

Jul 28, 2019 11:45:37 AM com.hazelcast.impl.LifecycleServiceImpl
INFO: [share_n1.domain]:5801 [slingshot] Address[share_n1.domain]:5801 is STARTED
2019-07-28 11:45:37,164  INFO  [web.site.ClusterTopicService] [localhost-startStop-1] Init complete for Hazelcast cluster - listening on topic: share_hz_test

 

Then starting a second node of the Share Cluster will display the following (still on the node1 logs):

Jul 28, 2019 11:48:31 AM com.hazelcast.nio.SocketAcceptor
INFO: [share_n1.domain]:5801 [slingshot] 5801 is accepting socket connection from /10.10.10.11:34191
Jul 28, 2019 11:48:31 AM com.hazelcast.nio.ConnectionManager
INFO: [share_n1.domain]:5801 [slingshot] 5801 accepted socket connection from /10.10.10.11:34191
Jul 28, 2019 11:48:38 AM com.hazelcast.cluster.ClusterManager
INFO: [share_n1.domain]:5801 [slingshot]

Members [2] {
        Member [share_n1.domain]:5801 this
        Member [share_n2.domain]:5801
}

 

 

Other posts of this series on Alfresco HA/Clustering:

Cet article Alfresco Clustering – Share est apparu en premier sur Blog dbi services.

Java JDK12: JEP 325: Switch Expressions

Tue, 2019-07-30 02:25
Eclipse setup

In order to test Java JDK12 you will have to download Eclipse 4.11 at least. Then download JDK12 from Oracle’s web site. And configure Eclipse to use this JDK for the project. Also download the JDK12 support with the following repository link:

https://download.eclipse.org/eclipse/updates/4.11-P-builds

Edit the projects properties to use the “Java Compiler” compliance 12. And “Enable preview features” set to TRUE. Now you should be able to run JDK12 examples. Here a link for the complete setup:

https://marketplace.eclipse.org/content/java-12-support-eclipse-2019-03-411

Switch expression

The switch expression receives a new lifting in this version. It allows more readiness and more flexibility. We can now:

  • Get rid of the break word in certain cases
  • Return a value from the switch
  • Put several values in one “case”

With the following example you can see how to get rid of the “break” and use a “case L ->” switch label. In addition, the switch returns a value which can be used after the expression.

String alias = switch (day) {
    case MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY 	-> "Week day";
    case SATURDAY, SUNDAY				                -> "Weekend";
};
System.out.println(alias);

The previous few lines replaces the following ones:

switch (day) {
    case MONDAY:
        System.out.println("Week day");
        break;
    case TUESDAY:
        System.out.println("Week day");
        break;
    case WEDNESDAY:
        System.out.println("Week day");
        break;
    case THURSDAY:
        System.out.println("Week day");
        break;
    case FRIDAY:
        System.out.println("Week day");
        break;
    case SATURDAY:
        System.out.println("Weekend");
        break;
    case SUNDAY:
        System.out.println("Weekend");
        break;
}

Here the expression:

T result = switch (arg) {
    case L1 -> e1;
    case L2 -> e2;
    default -> e3;
};

When using a block of code you will still need a “break”:

int i = switch (day) {
     case MONDAY -> {
     System.out.println("Monday");
     break 0; //break is needed here to return a value if switch need a return
     }
     default -> 1;
}

Thanks to this new notation, we can now return values with a SWITCH expression. In addition it makes it more readable fo big switch expressions. Note that you can still use the “old” way, thus old code will still compile with the new JDK. For the moment it’s deployed in JDK12 which is a non-LTS release, but if the new expression is validated by the tests and results, it will definitely be part of future releases like the LTS versions and you will be able to use it then.

Cet article Java JDK12: JEP 325: Switch Expressions est apparu en premier sur Blog dbi services.

Alfresco Clustering – Repository

Tue, 2019-07-30 01:00

In a previous blog, I talked about some basis and presented some possible architectures for Alfresco. Now that this introduction has been done, let’s dig into the real blogs about how to setup a HA/Clustering Alfresco environment. In this blog in particular, I will talk about the Repository layer.

For the Repository Clustering, there are three prerequisites (and that’s all you need):

  • A valid license which include the Repository Clustering
  • A shared file system which is accessible from all Alfresco nodes in the Cluster. This is usually a NAS accessed via NFS
  • A shared database

 

Clustering the Repository part is really simple to do: you just need to put the correct properties in the alfresco-global.properties file. Of course, you could also manage it all from the Alfresco Admin Console but that’s not recommended, you should really always use the alfresco-global.properties by default. The Alfresco Repository Clustering is using Hazelcast. It was using JGroups and EHCache as well before Alfresco 4.2 but now it’s just Hazelcast. So to define an Alfresco Cluster, simply put the following configuration in the alfresco-global.properties of the Alfresco node1:

[alfresco@alf_n1 ~]$ getent hosts `hostname` | awk '{ print $1 }'
10.10.10.10
[alfresco@alf_n1 ~]$
[alfresco@alf_n1 ~]$ cat $CATALINA_HOME/shared/classes/alfresco-global.properties
...
### Content Store
dir.root=/shared_storage/alf_data
...
### DB
db.username=alfresco
db.password=My+P4ssw0rd
db.name=alfresco
db.host=db_vip
## MySQL
#db.port=3306
#db.driver=com.mysql.jdbc.Driver
#db.url=jdbc:mysql://${db.host}:${db.port}/${db.name}?useUnicode=yes&characterEncoding=UTF-8
#db.pool.validate.query=SELECT 1
## PostgreSQL
db.driver=org.postgresql.Driver
db.port=5432
db.url=jdbc:postgresql://${db.host}:${db.port}/${db.name}
db.pool.validate.query=SELECT 1
## Oracle
#db.driver=oracle.jdbc.OracleDriver
#db.port=1521
#db.url=jdbc:oracle:thin:@${db.host}:${db.port}:${db.name}
#db.pool.validate.query=SELECT 1 FROM DUAL
...
### Clustering
alfresco.cluster.enabled=true
alfresco.cluster.interface=10.10.10.10-11
alfresco.cluster.nodetype=Alfresco_node1
alfresco.hazelcast.password=Alfr3sc0_hz_Test_pwd
alfresco.hazelcast.port=5701
alfresco.hazelcast.autoinc.port=false
alfresco.hazelcast.max.no.heartbeat.seconds=15
...
[alfresco@alf_n1 ~]$

 

And for the Alfresco node2, you can use the same content:

[alfresco@alf_n2 ~]$ getent hosts `hostname` | awk '{ print $1 }'
10.10.10.11
[alfresco@alf_n2 ~]$
[alfresco@alf_n2 ~]$ cat $CATALINA_HOME/shared/classes/alfresco-global.properties
...
### Content Store
dir.root=/shared_storage/alf_data
...
### DB
db.username=alfresco
db.password=My+P4ssw0rd
db.name=alfresco
db.host=db_vip
## MySQL
#db.port=3306
#db.driver=com.mysql.jdbc.Driver
#db.url=jdbc:mysql://${db.host}:${db.port}/${db.name}?useUnicode=yes&characterEncoding=UTF-8
#db.pool.validate.query=SELECT 1
## PostgreSQL
db.driver=org.postgresql.Driver
db.port=5432
db.url=jdbc:postgresql://${db.host}:${db.port}/${db.name}
db.pool.validate.query=SELECT 1
## Oracle
#db.driver=oracle.jdbc.OracleDriver
#db.port=1521
#db.url=jdbc:oracle:thin:@${db.host}:${db.port}:${db.name}
#db.pool.validate.query=SELECT 1 FROM DUAL
...
### Clustering
alfresco.cluster.enabled=true
alfresco.cluster.interface=10.10.10.10-11
alfresco.cluster.nodetype=Alfresco_node2
alfresco.hazelcast.password=Alfr3sc0_hz_Test_pwd
alfresco.hazelcast.port=5701
alfresco.hazelcast.autoinc.port=false
alfresco.hazelcast.max.no.heartbeat.seconds=15
...
[alfresco@alf_n2 ~]$

 

Description of the Clustering parameters:

  • alfresco.cluster.enabled: Whether or not you want to enable the Repository Clustering for the local Alfresco node. The default value is false. You will want to set that to true for all Repository nodes that will be used by Share or any other client. If the Repository is only used for Solr Tracking, you can leave that to false
  • alfresco.cluster.interface: This is the network interface on which Hazelcast will listen for Clustering messages. This has to be an IP, it can’t be a hostname. To keep things simple and to have the same alfresco-global.properties on all Alfresco nodes however, it is possible to use a specific nomenclature:
    • 10.10.10.10: Hazelcast will try to bind on 10.10.10.10 only. If it’s not available, then it won’t start
    • 10.10.10.10-11: Hazelcast will try to bind on any IP within the range 10-11 so in this case 2 IPs: 10.10.10.10 or 10.10.10.11. If you have, let’s say, 4 IPs assigned to the local host and you don’t want Hazelcast to use 2 of these, then specify the ones that it can use and it will pick one from the list. This can also be used to have the same content for the alfresco-global.properties on different hosts… One server with IP 10.10.10.10 and a second one with IP 10.10.10.11
    • 10.10.10.* or 10.10.*.*: Hazelcast will try to bind on any IP in this range, this is an extended version of the XX-YY range above
  • alfresco.cluster.nodetype: A human-friendly string to represent the local Alfresco node. It doesn’t have any use for Alfresco, that’s really more for you. It is for example interesting to put a specific string for Alfresco node that won’t take part in the Clustering but that are still using the same Content Store and Database (like a Repository dedicated for the Solr Tracking, as mentioned above)
  • alfresco.hazelcast.password: The password to use for the Alfresco Repository Cluster. You need to use the same password for all members of the same Cluster. You should as well try to use a different password for each Cluster that you might have if they are in the same network (DEV/TEST/PROD for example), otherwise it will get ugly
  • alfresco.hazelcast.port: The default port that will be used for Clustering messages between the different members of the Cluster
  • alfresco.hazelcast.autoinc.port: Whether or not you want to allow Hazelcast to find another free port in case the default port (“alfresco.hazelcast.port”) is currently used. It will increment the port by 1 each time. You should really set this to false and just use the default port, to have full control over the channels that Clustering communications are using otherwise it might get messy as well
  • alfresco.hazelcast.max.no.heartbeat.seconds: The maximum time in seconds allowed between two heartbeat. If there is no heartbeat in this period of time, Alfresco will assume the remote node isn’t running/available

 

As you can see above, it’s really simple to add Clustering to an Alfresco Repository. Since you can(should?) have the same set of properties (except the nodetype string maybe), then it also really simplifies the deployment… If you are familiar with other Document Management System like Documentum for example, then you understand the complexity of some of these solutions! If you compare that to Alfresco, it’s like walking on the street versus walking on the moon where you obviously first need to go to the moon… Anyway, once it’s done, the logs of the Alfresco Repository node1 will display something like that when you start it:

2019-07-20 15:14:25,401  INFO  [cluster.core.ClusteringBootstrap] [localhost-startStop-1] Cluster started, name: MainRepository-<generated_id>
2019-07-20 15:14:25,405  INFO  [cluster.core.ClusteringBootstrap] [localhost-startStop-1] Current cluster members:
  10.10.10.10:5701 (hostname: alf_n1)

 

Wait for the Repository node1 to be fully started and once done, you can start the Repository node2, it needs to be started sequentially normally. You will see on the logs of the Repository node1 that another node joined automatically the Cluster:

2019-07-20 15:15:06,528  INFO  [cluster.core.MembershipChangeLogger] [hz._hzInstance_1_MainRepository-<generated_id>.event-3] Member joined: 10.10.10.11:5701 (hostname: alf_n2)
2019-07-20 15:15:06,529  INFO  [cluster.core.MembershipChangeLogger] [hz._hzInstance_1_MainRepository-<generated_id>.event-3] Current cluster members:
  10.10.10.10:5701 (hostname: alf_n1)
  10.10.10.11:5701 (hostname: alf_n2)

 

On the logs of the Repository node2, you can see directly at the initialization of the Hazelcast Cluster that the two nodes are available.

If you don’t want to check the logs, you can see pretty much the same thing from the Alfresco Admin Console. By accessing “http(s)://<hostname>:<port>/alfresco/s/enterprise/admin/admin-clustering“, you can see currently available cluster members (online nodes), non-available cluster members (offline nodes) as well as connected non-cluster members (nodes using the same DB & Content Store but with “alfresco.cluster.enabled=false”, for example to dedicate a Repository to Solr Tracking).

Alfresco also provides a small utility to check the health of the cluster which will basically ensure that the communication between each member is successful. This utility can be accessed at “http(s)://<hostname>:<port>/alfresco/s/enterprise/admin/admin-clustering-test“. It is useful to include a quick check using this utility in a monitoring solution for example, to ensure that the cluster is healthy.

 

Cet article Alfresco Clustering – Repository est apparu en premier sur Blog dbi services.

CloudBees DevOps Playground – Hands On with JenkinsX

Mon, 2019-07-29 16:32

Last week, we had the chance to attend to the CloudBees DevOps Playground in London. The event was a presentation and a Hands-On on Jenkins X done by one of the most popular guys from the CloudBees, Gareth Evans.

Before taking an interest in Jenkins X, we focused most of our time in the Docker and Kubernetes part. We enhance a lot our skills during the last months on the administration of Kubernetes cluster and the deployment of applications, especially the Documentum stack as well WebLogic.

Jenkins X is quite a new technology in the landscape of automatic deployment, and we face the difficulties to find a workshop /training related to Jenkins X administration and usage. So we decided to go to London for this CloudBees DevOps Hands-on done.

As working in middleware infrastructures between system engineer and applications, Jenkins X is completely making sense for us to automate the creation of Kubernetes infrastructure in terms of cluster and application deployment.

What’s Jenkins X?

Basically, Jenkins X automates the whole development process end to end for containerized applications based on Docker and Kubernetes.

Overview of Jenkins X:

  • Jenkins X provides an automated CI/CD solution for Kubernetes
  • Buildpacks to quickly create new applications
  • Uses GitOps to manage promotion between Environments
  • Creates Preview Environments on Pull Requests
  • Provides control via ChatOps and feedback on Pull Requests
  • Improves developers’ productivity
  • It is open source
  • Microservices architecture
  • Designed for extension
  • Relies on k8s CRDs

JX Topologies:

Jenkins X can work in 2 modes: Static and Serverless.

Cloud-Native approach:
Our goal will be to use Jenkins X to automate the deployment of containerized applications on Kubernetes cluster.
Jenkins X make a real collaboration between system engineer and application teams with a focus on making development teams productive through automation and DevOps best practices.

We will achieve the automation of CI/CD pipelines using Jenkins X as following:

This is how Jenkins X works (big picture) and we will see later how to install JX with the different methods on the cloud or on-premise and how to build CI/CD pipelines.

Cet article CloudBees DevOps Playground – Hands On with JenkinsX est apparu en premier sur Blog dbi services.

Alfresco Clustering – Basis & Architectures

Mon, 2019-07-29 01:00

This blog will be the first of a series on Alfresco HA/Clustering topics. It’s been too long I haven’t posted anything related to Alfresco so I thought about writing a few blogs about my experience with setting up more or less complex HA/Clustering infrastructures. So, let’s start this first part with an introduction to the Alfresco HA/Clustering.

If you want to setup a HA/Cluster environment, you will have to first think about where you want to go exactly. Alfresco is composed of several components so “what do you want to achieve exactly?”, that would probably be the first question to ask.

Alfresco offers a lot of possibilities, you can more or less do whatever you want. That’s really great, but it also means that you should plan what you want to do first. Do you just want a simple HA architecture for Share+Repository but you can live without Solr for a few minutes/hours (in case of issues) or you absolutely want all components to be always available? Or maybe you want an HA architecture which is better suited for high throughput? Obviously, there might be some costs details that need to be taken into consideration linked to the resources but also the licenses: the Alfresco Clustering license itself but also the Index Engine license if you go for separated Solr Servers.

That’s what you need to define first to avoid losing time changing configurations and adding more components into the picture later. Alternatively (and that’s something I will try to cover as much as I can), it’s also possible to setup an environment which will allow you to add more components (at least some of them…) as needed without having to change your HA/Clustering configuration, if you are doing it right from the start and if you don’t change too much the architecture itself.

I mentioned earlier the components of Alfresco (Alfresco Content Services, not the company), these are the ones we are usually talking about:

  • *Front-end (Apache HTTPD, Nginx, …)
  • *ActiveMQ
  • Alfresco PDF Renderer
  • Database
  • File System
  • ImageMagick
  • Java
  • LibreOffice
  • *Share (Tomcat)
  • *Repository (Tomcat)
  • *Solr6 (Jetty)

 

In this series of blog, I won’t talk about the Alfresco PDF Renderer, ImageMagick & Java because these are just simple binaries/executables that need to be available from the Repository side. For LibreOffice, it’s usually Alfresco that is managing it directly (multi-processes, restart if crash, aso…). It wouldn’t really make sense to talk about these in blogs related to Clustering. I will also disregard the Database and File System ones since they are usually out of my scope. The Database is usually installed & managed by my colleagues which are DBAs, they are much better at that than me. That leaves us with all components with an asterisk (*). I will update this list with links to the different blogs.

Before jumping in the first component, which will be the subject of the next blog, I wanted to go through some possible architectures for Alfresco. There are a lot of schemas available on internet but it’s often the same architecture that is presented so I thought I would take some time to represent, in my own way, what the Alfresco’s architecture could look like.

In the below schemas, I represented the main components: Front-end, Share, Repository, Solr, Database & File System (Data) as little boxes. As mentioned previously, I won’t talk about the Database & File System so I just represented them once to see the communications with these but what is behind their boxes can be anything (with HA/Clustering or not). The arrows represent the way communications are initiated: an arrow in a single direction “A->B” means that B is never initiating a communication with A. Boxes that are glued together represent all components installed on the same host (a physical server, a VM, a container or whatever).

 

Alfresco Architecture 1N°1: This is the simplest architecture for Alfresco. As you can see, it’s not a HA/Clustering architecture but I decided to start small. I added a Front-end (even if it’s not mandatory) because it’s a best practice and I would not install Alfresco without it. Nothing specific to say on this architecture, it’s just simple.

 

Alfresco Architecture 2N°2: The first thing to do if you have the simplest architecture in place (N°1) and you start seeing some resources contention is to split the components and more specifically to install Solr separately. This should really be the minimal architecture to use, whenever possible.

 

Alfresco Architecture 3N°3: This is the first HA/Clustering architecture. It starts small as you can see with just two nodes for each Front-end/Share/Repository stack with a Load Balancer to dispatch the load on each side for an Active/Active solution. The dotted grey lines represent the Clustering communications. In this architecture, there is therefore a Clustering for Share and another one for the Repository layer. The Front-end doesn’t need Clustering since it just forwards the communications but the session itself is on the Tomcat (Share/Repository) side. There is only one Solr node and therefore both Repository boxes will communicate with the Solr node (through the Front-end or not). Between the Repository and Solr, there is one bidirectional arrow and another one unidirectional. That’s because both Repository boxes will initiate searches but the Solr will do tracking to index new content with only one Repository: this isn’t optimal.

 

Alfresco Architecture 4N°4: To solve this small issue with Solr tracking, we can add a second Load Balancer in between so that the Solr tracking can target any Repository node. The first bottleneck you will encounter in Alfresco is usually the Repository because a lot of things are happening in the background at that layer. Therefore, this architecture is usually the simplest HA/Clustering solution that you will want to setup.

 

Alfresco Architecture 5N°5: If you are facing some performance issues with Solr or if you want all components to be in HA, then you will have to duplicate the Solr as well. Between the two Solr nodes, I put a Clustering link, that’s in case you are using Solr Sharding. If you are using the default cores (alfresco and archive), then there is no communication between distinct Solr nodes. If you are using Solr Sharding and if you want a HA architecture, then you will have the same shards on both Solr nodes and in this case, there will be communications between the Solr nodes, it’s not really a Clustering so to speak, that’s how Solr Sharding is working but I still used the same representation.

 

Alfresco Architecture 6N°6: As mentioned previously (for the N°4), the Repository is usually the bottleneck. To reduce the load on this layer, it is possible to do several things. The first possibility is to install another Repository and dedicate it to the Solr Tracking. As you can see above, the communications aren’t bidirectional anymore but only unidirectional. Searches will come from the two Repository that are in Cluster and Solr Tracking will use the separated/dedicated Repository. This third Repository can then be set in read-only, the jobs and services can be disabled, the Clustering can be disabled as well (so it uses the same DB but it’s not part of the Clustering communications because it doesn’t have to), aso… I put this third Repository as a standalone box but obviously you can install it with one of the two Solr nodes.

 

Alfresco Architecture 7N°7: The next step can be to add another read-only Repository and put these two nodes side by side with the Solr nodes. This is to only have localhost communications for the Solr Tracking which is therefore a little bit easier to secure.

 

Alfresco Architecture 8N°8: The previous architectures (N°6 & N°7) introduced a new single point of failure so to fix this, there is only one way: add a new Load Balancer between the Solr and the Repository for the tracking. Behind the Load Balancer, there are two solutions: keep the fourth Repository which is also in read-only or use a fallback to the Repository node1/node2 in case the read-only Repository (node3) isn’t available. For that purpose, the Load Balancer should be in, respectively, Active/Active or Active/Passive. As you can see, I choose to represent the first one.

 

These were a few possible architectures. You can obviously add more nodes if you want to, to handle more load. There are many other solutions so have fun designing the best one, according to your requirements.

 

Cet article Alfresco Clustering – Basis & Architectures est apparu en premier sur Blog dbi services.

dbvisit 9: Automatic Failover

Sat, 2019-07-27 06:43

dbvisit 9 is released since a few months. And one new feature I tested is the Automatic Failover. In this blog I suppose that dbvisit 9 is already installed and that the standby database is already created. Indeed I will not describe nor dbvisit installation neither the standby creation as it is the same as the previous versions.
For more info about dbvisit installation and/or dbvisit standby creation please see these steps in my previous blog or dbvisit documentation
The new feature Autamatic Failover needs to install an observer which main functions are:
-Provide remote monitoring of existing DDCs, and inform the DBA of problems in close to real-time
-Automatically perform a Failover of the DDC if previously-specified conditions are met.

We will describe observer installation and configuration later

We describe below the configuration we are using
dbvisit1: primary server with Oracle 19c
dbvisit2: standby server with Oracle 19c
dbvisitconsole : Host of the dbvisit console (dbvserver) and for the observer

As specified earlier, we need to install an observer. It is very easy to do this, just launch the install_dbvisit executable and follow the instructions

[oracle@dbvisitconsole installer]$ pwd
/home/oracle/dbvisit/installer
[oracle@dbvisitconsole installer]$ ./install-dbvisit

-----------------------------------------------------------
    Welcome to the Dbvisit software installer.
-----------------------------------------------------------

    It is recommended to make a backup of our current Dbvisit software
    location (Dbvisit Base location) for rollback purposes.

    Installer Directory /home/oracle/dbvisit

>>> Please specify the Dbvisit installation directory (Dbvisit Base).

    The various Dbvisit products and components - such as Dbvisit Standby,
    Dbvisit Dbvnet will be installed in the appropriate subdirectories of
    this path.

    Enter a custom value or press ENTER to accept default [/usr/dbvisit]:
     > /u01/app/dbvisit
    DBVISIT_BASE = /u01/app/dbvisit

    -----------------------------------------------------------
    Component      Installer Version   Installed Version
    -----------------------------------------------------------
    standby        9.0.02_0_gbd40c486                                not installed                 
    dbvnet         9.0.02_0_gbd40c486                                not installed                 
    dbvagent       9.0.02_0_gbd40c486                                not installed                 
    dbvserver      9.0.02_0_gbd40c486                                9.0.02_0_gbd40c486            
    observer       1.02                                              not installed                 

    -----------------------------------------------------------

    What action would you like to perform?
       1 - Install component(s)
       2 - Uninstall component(s)
       3 - Exit

    Your choice: 1

    Choose component(s):
       1 - Core Components (Dbvisit Standby Cli, Dbvnet, Dbvagent)
       2 - Dbvisit Standby Core (Command Line Interface)
       3 - Dbvnet (Dbvisit Network Communication)
       4 - Dbvagent (Dbvisit Agent)
       5 - Dbvserver (Dbvisit Central Console) - Not available on Solaris/AIX
       6 - Dbvisit Observer (Automatic Failover Option) - Not available on Solaris/AIX
    Press ENTER to exit Installer

    Your choice: 6

-----------------------------------------------------------
    Summary of the Dbvisit OBSERVER configuration
-----------------------------------------------------------
    DBVISIT_BASE /u01/app/dbvisit

    Press ENTER to continue

-----------------------------------------------------------
    About to install Dbvisit OBSERVER
-----------------------------------------------------------
    Component observer installed.

    -----------------------------------------------------------
    Component      Installer Version   Installed Version
    -----------------------------------------------------------
    standby        9.0.02_0_gbd40c486                                not installed                 
    dbvnet         9.0.02_0_gbd40c486                                not installed                 
    dbvagent       9.0.02_0_gbd40c486                                not installed                 
    dbvserver      9.0.02_0_gbd40c486                                9.0.02_0_gbd40c486            
    observer       1.02                                              1.02                          

    -----------------------------------------------------------

    What action would you like to perform?
       1 - Install component(s)
       2 - Uninstall component(s)
       3 - Exit

    Your choice: 3

>>> Installation completed
    Install log /tmp/dbvisit_install.log.201907231647.
[oracle@dbvisitconsole installer]$

And once the installation done, we can start it

[oracle@dbvisitconsole observer]$ ./observersvc start &
[1] 2866

[oracle@dbvisitconsole observer]$ ps -ef | grep obser
oracle    2866  2275  0 14:25 pts/0    00:00:01 ./observersvc start
oracle    2921  2275  0 14:29 pts/0    00:00:00 grep --color=auto obser
[oracle@dbvisitconsole observer]$

After starting the observer we have to add the observer server. This is done from the MANAGE CONFIGURATION TAB from the MENU

From the Configuration TAB, choose the NEW on the left to add a dbvisit observer

And then fill the informations. Note that the default passphrase for the observer is admin900 and then save

To monitor our configuration by the observer, let’s click on Monitor

And then specify the poll interval and the number of retries before a failover happens. In our case

The observer will monitor the configuration every 60 s and will retry 5 times if there is any error.
If after 5 minutes (5×60 secondes), the probleme is not fixed, than an automatic failover will happen.

The observer logfile is located on the observer server

[oracle@dbvisitconsole log]$ pwd
/u01/app/dbvisit/observer/log
[oracle@dbvisitconsole log]$ ls -l
total 8
-rw-r--r--. 1 oracle oinstall 1109 Jul 25 15:24 observer.log
-rw-r--r--. 1 oracle oinstall   97 Jul 25 15:24 orcl_1_observer.log
[oracle@dbvisitconsole log]$

[oracle@dbvisitconsole log]$ tail -f orcl_1_observer.log
2019/07/25 13:24:46 DDC: DDC#1(orcl): Started watchdog: Watchdog successfully started monitoring

Now let’s break the primary database and normally a failover should happen after 5 minutes

[oracle@dbvisit1 log]$ ps -ef | grep pmon
oracle    1887     1  0 14:03 ?        00:00:00 ora_pmon_orcl
oracle   18199  1733  0 15:31 pts/0    00:00:00 grep --color=auto pmon
[oracle@dbvisit1 log]$ kill -9 1887
[oracle@dbvisit1 log]$ ps -ef | grep pmon
oracle   18304  1733  0 15:32 pts/0    00:00:00 grep --color=auto pmon
[oracle@dbvisit1 log]$

In the observer logfile we can see that the standby was promoted after 5 retries.

[oracle@dbvisitconsole log]$ tail -f orcl_1_observer.log
2019/07/25 13:24:46 DDC: DDC#1(orcl): Started watchdog: Watchdog successfully started monitoring
2019/07/25 13:33:51 DDC: DDC#1(orcl): rules failing (1/5): primary: error on dbvisit1:7891: unexpected database status for DDC orcl: got: "Database is down", expected: "Regular database open in read write mode"
2019/07/25 13:34:51 DDC: DDC#1(orcl): rules failing (2/5): primary: error on dbvisit1:7891: unexpected database status for DDC orcl: got: "Database is down", expected: "Regular database open in read write mode"
2019/07/25 13:35:51 DDC: DDC#1(orcl): rules failing (3/5): primary: error on dbvisit1:7891: unexpected database status for DDC orcl: got: "Database is down", expected: "Regular database open in read write mode"
2019/07/25 13:36:51 DDC: DDC#1(orcl): rules failing (4/5): primary: error on dbvisit1:7891: unexpected database status for DDC orcl: got: "Database is down", expected: "Regular database open in read write mode"
2019/07/25 13:37:51 DDC: DDC#1(orcl): rules failing (5/5): primary: error on dbvisit1:7891: unexpected database status for DDC orcl: got: "Database is down", expected: "Regular database open in read write mode"
2019/07/25 13:37:51 DDC: DDC#1(orcl): configuration failed after 5 retries: primary: error on dbvisit1:7891: unexpected database status for DDC orcl: got: "Database is down", expected: "Regular database open in read write mode"
2019/07/25 13:37:51 DDC: DDC#1(orcl): watchdog shutting down: activation imminent
2019/07/25 13:37:51 DDC: DDC#1(orcl): ACTIVATION started: conditions for activation satisfied
2019/07/25 13:38:41 DDC: DDC#1(orcl): ACTIVATION successful: ACTIVATION OK: standby activated, activation took: 50.043794192s

And we can verify that the standby is now open in read write mode

[oracle@dbvisit2 trace]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Jul 25 15:49:38 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> select db_unique_name,open_mode from v$database;

DB_UNIQUE_NAME                 OPEN_MODE
------------------------------ --------------------
orcl                           READ WRITE

SQL>

Note that we can use a user defined script with the observer. For more information please see dbvisit documentation

Cet article dbvisit 9: Automatic Failover est apparu en premier sur Blog dbi services.

Alfresco – ActiveMQ basic setup

Fri, 2019-07-26 08:49

Apache ActiveMQ

ActiveMQ is an open source Java Messaging Server (JMS) from the Apache Software Foundation that supports a lot of protocols. In Alfresco 5, ActiveMQ has been introduced as a new, optional, component in the stack. It was, at the beginning, only used for “side” features like Alfresco Analytics or Alfresco Media Management in the early Alfresco 5.0. In Alfresco 6.0, ActiveMQ was still used for Alfresco Media Management but also for the Alfresco Sync Service. It’s only starting with the Alfresco 6.1, released last February, that ActiveMQ became a required component, used for the same things but also now used for transformations.

The Alfresco documentation doesn’t really describe how to install ActiveMQ or how to configure it, it just explains how to connect Alfresco to it. Therefore, I thought I would write a small blog about how to do a basic installation of ActiveMQ for a usage in Alfresco.

Alfresco 6.1 supports ActiveMQ v5.15.6 so that’s the one I will be using for this blog as example.

First let’s start with defining some environment variables that will be used to know where to put ActiveMQ binaries and data:

[alfresco@mq_n1 ~]$ echo "export ACTIVEMQ_HOME=/opt/activemq" >> ~/.profile
[alfresco@mq_n1 ~]$ echo "export ACTIVEMQ_DATA=\$ACTIVEMQ_HOME/data" >> ~/.profile
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ grep "ACTIVEMQ" ~/.profile
export ACTIVEMQ_HOME=/opt/activemq
export ACTIVEMQ_DATA=$ACTIVEMQ_HOME/data
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ source ~/.profile
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ echo $ACTIVEMQ_DATA
/opt/activemq/data
[alfresco@mq_n1 ~]$

 

I’m usually using symlinks for all the components so that I can keep a generic path in case of upgrades, aso… So, let’s download the software and put all that where it should:

[alfresco@mq_n1 ~]$ activemq_version="5.15.6"
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ wget http://archive.apache.org/dist/activemq/${activemq_version}/apache-activemq-${activemq_version}-bin.tar.gz
--2019-07-25 16:55:23--  http://archive.apache.org/dist/activemq/5.15.6/apache-activemq-5.15.6-bin.tar.gz
Resolving archive.apache.org... 163.172.17.199
Connecting to archive.apache.org|163.172.17.199|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 58556801 (56M) [application/x-gzip]
Saving to: ‘apache-activemq-5.15.6-bin.tar.gz’

apache-activemq-5.15.6-bin.tar.gz     100%[=======================================================================>]  55.84M  1.62MB/s    in 35s

2019-07-25 16:55:58 (1.60 MB/s) - ‘apache-activemq-5.15.6-bin.tar.gz’ saved [58556801/58556801]

[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ tar -xzf apache-activemq-${activemq_version}-bin.tar.gz
[alfresco@mq_n1 ~]$ mkdir -p $ACTIVEMQ_HOME-${activemq_version}
[alfresco@mq_n1 ~]$ ln -s $ACTIVEMQ_HOME-${activemq_version} $ACTIVEMQ_HOME
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ ls -l $ACTIVEMQ_HOME/.. | grep -i activemq
lrwxr-xr-x   1 alfresco  alfresco        31 Jul 25 17:04 activemq -> /opt/activemq-5.15.6
drwxr-xr-x   2 alfresco  alfresco        64 Jul 25 17:03 activemq-5.15.6
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ rm -rf ./apache-activemq-${activemq_version}/data
[alfresco@mq_n1 ~]$ mkdir -p $ACTIVEMQ_DATA
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ mv apache-activemq-${activemq_version}/* $ACTIVEMQ_HOME/

 

Once that is done and before starting ActiveMQ for the first time, there are still some configurations to be done. It is technically possible to add a specific authentication for communications between Alfresco and ActiveMQ or setup the communications in SSL for example. It depends on the usage you will have for the ActiveMQ but as a minimal configuration for use with Alfresco, I believe that the default users (“guest” to access docbroker & “user” to access web console) should at least be removed and the admin password changed:

[alfresco@mq_n1 ~]$ activemq_admin_pwd="Act1v3MQ_pwd"
[alfresco@mq_n1 ~]$ activemq_broker_name="`hostname -s`"
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ # Remove user "user" from the web console
[alfresco@mq_n1 ~]$ sed -i "/^user:[[:space:]]*.*/d" $ACTIVEMQ_HOME/conf/jetty-realm.properties
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ # Remove user "guest" from the broker
[alfresco@mq_n1 ~]$ sed -i "/^guest.*/d" $ACTIVEMQ_HOME/conf/credentials.properties
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ # Change admin password
[alfresco@mq_n1 ~]$ sed -i "s/^admin=.*/admin=${activemq_admin_pwd}\n/" $ACTIVEMQ_HOME/conf/users.properties
[alfresco@mq_n1 ~]$ sed -i "s/^admin.*/admin: ${activemq_admin_pwd}, admin/" $ACTIVEMQ_HOME/conf/jetty-realm.properties
[alfresco@mq_n1 ~]$ sed -i "s/^activemq.username=.*/activemq.username=admin/" $ACTIVEMQ_HOME/conf/credentials.properties
[alfresco@mq_n1 ~]$ sed -i "s/^activemq.password=.*/activemq.password=${activemq_admin_pwd}/" $ACTIVEMQ_HOME/conf/credentials.properties
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ grep -E "brokerName|storeUsage |tempUsage " $ACTIVEMQ_HOME/conf/activemq.xml
    <broker xmlns="http://activemq.apache.org/schema/core" brokerName="localhost" dataDirectory="${activemq.data}">
                <storeUsage limit="100 gb"/>
                <tempUsage limit="50 gb"/>
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ # Set broker name & allowed usage
[alfresco@mq_n1 ~]$ sed -i "s/brokerName=\"[^"]*\"/brokerName=\"${activemq_broker_name}\"/" $ACTIVEMQ_HOME/conf/activemq.xml
[alfresco@mq_n1 ~]$ sed -i 's,storeUsage limit="[^"]*",storeUsage limit="10 gb",' $ACTIVEMQ_HOME/conf/activemq.xml
[alfresco@mq_n1 ~]$ sed -i 's,tempUsage limit="[^"]*",tempUsage limit="5 gb",' $ACTIVEMQ_HOME/conf/activemq.xml
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ grep -E "brokerName|storeUsage |tempUsage " $ACTIVEMQ_HOME/conf/activemq.xml
    <broker xmlns="http://activemq.apache.org/schema/core" brokerName="mq_n1" dataDirectory="${activemq.data}">
                    <storeUsage limit="10 gb"/>
                    <tempUsage limit="5 gb"/>
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ chmod -R o-rwx $ACTIVEMQ_HOME
[alfresco@mq_n1 ~]$ chmod -R o-rwx $ACTIVEMQ_DATA

 

So above, I set a specific name for the broker, that’s mainly if you expect to see at some points several brokers, to differentiate them. I also change the default storeUsage and tempUsage, that’s mainly to show how it’s done because these two parameters define the limit that ActiveMQ will be able to use on the file system. I believe the default is way too much for ActiveMQ’s usage in Alfresco, so I always reduce these or use a percentage as value (percentLimit).

With the default configuration, ActiveMQ uses “${activemq.data}” for the data directory which is actually using the “$ACTIVEMQ_DATA” environment variable, if present (otherwise it sets it as $ACTIVEMQ_HOME/data). That’s the reason why I set this environment variable, so it is possible to define a different data folder without having to change the default configuration. This data folder will mainly contain the logs of ActiveMQ, the PID file and the KahaDB for the persistence adapter.

Finally creating a service for ActiveMQ and starting it is pretty easy as well:

[alfresco@mq_n1 ~]$ cat > activemq.service << EOF
[Unit]
Description=ActiveMQ service

[Service]
Type=forking
ExecStart=###ACTIVEMQ_HOME###/bin/activemq start
ExecStop=###ACTIVEMQ_HOME###/bin/activemq stop
Restart=always
User=alfresco
WorkingDirectory=###ACTIVEMQ_DATA###
LimitNOFILE=8192:65536

[Install]
WantedBy=multi-user.target
EOF
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ sed -i "s,###ACTIVEMQ_HOME###,${ACTIVEMQ_HOME}," activemq.service
[alfresco@mq_n1 ~]$ sed -i "s,###ACTIVEMQ_DATA###,${ACTIVEMQ_DATA}," activemq.service
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ sudo cp activemq.service /etc/systemd/system/
[alfresco@mq_n1 ~]$ rm activemq.service
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ sudo systemctl enable activemq.service
[alfresco@mq_n1 ~]$ sudo systemctl daemon-reload
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ sudo systemctl start activemq.service

 

Once ActiveMQ is setup as you want, for the registration in Alfresco, it’s very easy:

[alfresco@alf_n1 ~]$ cat $CATALINA_HOME/shared/classes/alfresco-global.properties
...
### ActiveMQ
messaging.broker.url=failover:(tcp://mq_n1.domain:61616?daemon=false&dynamicManagement=false&trace=false)?timeout=3000&randomize=false
#messaging.username=
#messaging.password=
...
[alfresco@alf_n1 ~]$

 

As mentioned at the beginning of this blog, ActiveMQ supports a lot of protocols so you can use pretty much what you want: TCP, NIO, SSL, NIO SSL, Peer (2 Peer), UDP, Multicast, HTTP, HTTPS, aso… You can find all the details for that here.

To add authentication between Alfresco and ActiveMQ, you will need to enable the properties in the alfresco-global.properties (the two that I commented above) and define the appropriate authentication in the ActiveMQ broker configuration. There is an example in the Alfresco documentation.

 

Cet article Alfresco – ActiveMQ basic setup est apparu en premier sur Blog dbi services.

Deploying SQL Server 2019 container on RHEL 8 with podman

Wed, 2019-07-24 11:12

Having a fresh install of RHEL8 on my lab environment, I was curious to take a look at new containerization stuff from Red Hat in the context of SQL Server 2019. Good chances are the future version of SQL Server should be available and supported on with the latest version of Red Hat but for now this blog post is purely experimental. This time I wanted to share with you some thoughts about the new Podman command.

First of all, we should be aware that since RHEL8 Red Hat decided to replace docker with CRI-O/podman in order to provide a “daemonless” container world and especially for Kubernetes. By 2016, Kubernetes project introduced the Container Runtime Interface (CRI).  Basically, with CRI, Kubernetes can be container runtime-agnostic. CRI-O that is an open source project initiated by Red Hat the same year that gives the ability to run containers directly from Kubernetes without any unnecessary code or tooling as long as the container remains OCI-compliant. Because Docker is not implemented anymore (and officially not supported) by Red Hat since RHEL8, we need a client tool for working with containers and this is where Podman steps in. To cut the story short, Podman implements almost all the Docker CLI commands and more.

So, let’s have an overview of Podman commands through the installation of a SQL Server 2019 based container. It is worth noting that Podman is not intended to be used in the context of a “standalone” container environnement and should be used with an container orchestrator like K8s or an orchestration platform like OpenShift.  That said,  let’s first create a host directory to persist the SQL Server database files.

$ sudo mkdir -p  /var/mssql/data
$ sudo chmod 755 -R /var/mssql/data

 

Then let’s download the SQL Server 2019 RHEL image. We will use the following Podman command:

$ sudo podman pull mcr.microsoft.com/mssql/rhel/server:2019-CTP3.1
Trying to pull mcr.microsoft.com/mssql/rhel/server:2019-CTP3.1...Getting image source signatures
Copying blob 079e961eee89: 70.54 MiB / 70.54 MiB [========================] 1m3s
Copying blob 1b493d38a6d3: 1.20 KiB / 1.20 KiB [==========================] 1m3s
Copying blob 89e62e5b4261: 333.24 MiB / 333.24 MiB [======================] 1m3s
Copying blob d39017c722a8: 174.82 MiB / 174.82 MiB [======================] 1m3s
Copying config dbba412361d7: 4.98 KiB / 4.98 KiB [==========================] 0s
Writing manifest to image destination
Storing signatures
dbba412361d7ca4fa426387e1d6fc3ec85e37d630bfe70e6599b5116d392394d

 

Note that if you’re already comfortable with the Docker commands, the shift to Podman will be easy thanks to the similarity between the both tools. To get information of the new fresh image, we will use the following Podman command:

$ sudo podman images
REPOSITORY                            TAG           IMAGE ID       CREATED       SIZE
mcr.microsoft.com/mssql/rhel/server   2019-CTP3.1   dbba412361d7   3 weeks ago   1.79 GB
$ sudo podman inspect dbba
…
"GraphDriver": {
            "Name": "overlay",
            "Data": {
                "LowerDir": "/var/lib/containers/storage/overlay/b2769e971a1bdb62f1c0fd9dcc0e9fe727dca83f52812abd34173b49ae55e37d/diff:/var/lib/containers/storage/overlay/4b0cbf0d9d0ff230916734a790f47ab2adba69db44a79c8eac4c814ff4183c6d/diff:/var/lib/containers/storage/overlay/9197342671da8b555f200e47df101da5b7e38f6d9573b10bd3295ca9e5c0ae28/diff",
                "MergedDir": "/var/lib/containers/storage/overlay/b372c0d6ff718d2d182af4639870dc6e4247f684d81a8b2dc2649f8517b9fc53/merged",
                "UpperDir": "/var/lib/containers/storage/overlay/b372c0d6ff718d2d182af4639870dc6e4247f684d81a8b2dc2649f8517b9fc53/diff",
                "WorkDir": "/var/lib/containers/storage/overlay/b372c0d6ff718d2d182af4639870dc6e4247f684d81a8b2dc2649f8517b9fc53/work"
            }
        },
…

 

As show above, Podman uses the CRI-O back-end store directory with the /var/lib/containers path, instead of using the Docker default storage location (/var/lib/docker).

Go ahead and let’s take a look at the Podman info command:

$ podman info
…
OCIRuntime:
    package: runc-1.0.0-54.rc5.dev.git2abd837.module+el8+2769+577ad176.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.0'
…
store:
  ConfigFile: /home/clustadmin/.config/containers/storage.conf
  ContainerStore:
    number: 0
  GraphDriverName: overlay

 

The same kind of information is provided by the Docker info command including the runtime and the graph driver name that is overlay in my case. Generally speaking, creating and getting information of a container with Podman is pretty similar to what we may use with the usual Docker commands. Here  for instance the command to spin up a SQL Server container based on the RHEL image:

$ sudo podman run -d -e 'ACCEPT_EULA=Y' -e \
> 'MSSQL_SA_PASSWORD=Password1'  \
> --name 'sqltest' \
> -p 1460:1433 \
> -v /var/mssql/data:/var/opt/mssql/data:Z \
> mcr.microsoft.com/mssql/rhel/server:2019-CTP3.1
4f5128d36e44b1f55d23e38cbf8819041f84592008d0ebb2b24ff59065314aa4
$ sudo podman ps
CONTAINER ID  IMAGE                                            COMMAND               CREATED        STATUS            PORTS                   NAMES
4f5128d36e44  mcr.microsoft.com/mssql/rhel/server:2019-CTP3.1  /opt/mssql/bin/sq...  4 seconds ago  Up 3 seconds ago  0.0.0.0:1460->1433/tcp  sqltest

 

Here comes the interesting part. Looking at the pstree output we may notice that there is not dependencies with any (docker) daemon with CRI-O implementation. Usually with the Docker implementation we retrieve the containerd daemon and the related shim for the process within the tree. 

$ pstree
systemd─┬─NetworkManager───2*[{NetworkManager}]
        ├─…
        ├─conmon─┬─sqlservr─┬─sqlservr───138*[{sqlservr}]
        │        │          └─{sqlservr}

 

By using the runc command below, we may notice the MSSQL container (identified by the ID here) is actually running through CRI-O and runc runtime.

$ sudo runc list -q
4f5128d36e44b1f55d23e38cbf8819041f84592008d0ebb2b24ff59065314aa4

 

Let’s have a look at the existing namespace. The 9449 PID corresponds to the SQL Server process running in isolation mode through Linux namespaces.

$ sudo lsns 
…
4026532116 net         2  9449 root   /opt/mssql/bin/sqlservr
4026532187 mnt         2  9449 root   /opt/mssql/bin/sqlservr
4026532188 uts         2  9449 root   /opt/mssql/bin/sqlservr
4026532189 ipc         2  9449 root   /opt/mssql/bin/sqlservr
4026532190 pid         2  9449 root   /opt/mssql/bin/sqlservr

$ ps aux | grep sqlservr
root       9449  0.1  0.6 152072 25336 ?        Ssl  05:08   0:00 /opt/mssql/bin/sqlservr
root       9465  5.9 18.9 9012096 724648 ?      Sl   05:08   0:20 /opt/mssql/bin/sqlservr
clustad+   9712  0.0  0.0  12112  1064 pts/0    S+   05:14   0:00 grep --color=auto sqlservr

 

We can double check that the process belongs to the SQL Server container by using the nsenter command:

sudo nsenter -t 17182 --mount --uts --ipc --net --pid sh
sh-4.2# ps aux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root          1  0.0  0.7 152076 28044 ?        Ssl  Jul23   0:00 /opt/mssql/bin/sqlservr
root          9  2.2 19.7 9034224 754820 ?      Sl   Jul23   0:28 /opt/mssql/bin/sqlservr
root        319  0.0  0.0  13908  3400 ?        S    00:01   0:00 sh
root        326  0.0  0.1  53832  3900 ?        R+   00:02   0:00 ps aux

 

Well, we used different Podman commands to spin up a container that meets the OCI specification like Docker. For a sake of curiosity, let’s build a custom image from a Dockerfile. In fact, this is a custom image we developed for customers to meet our best practices requirements. 

$ ls -l
total 40
drwxrwxr-x. 2 clustadmin clustadmin   70 Jul 24 02:06 BestPractices
drwxrwxr-x. 2 clustadmin clustadmin   80 Jul 24 02:06 DMK
-rw-rw-r--. 1 clustadmin clustadmin  614 Jul 24 02:06 docker-compose.yml
-rw-rw-r--. 1 clustadmin clustadmin 2509 Jul 24 02:06 Dockerfile
-rw-rw-r--. 1 clustadmin clustadmin 3723 Jul 24 02:06 entrypoint.sh
-rw-rw-r--. 1 clustadmin clustadmin 1364 Jul 24 02:06 example.docker-swarm-compose.yml
-rw-rw-r--. 1 clustadmin clustadmin  504 Jul 24 02:06 healthcheck.sh
-rw-rw-r--. 1 clustadmin clustadmin   86 Jul 24 02:06 mssql.conf
-rw-rw-r--. 1 clustadmin clustadmin 4497 Jul 24 02:06 postconfig.sh
-rw-rw-r--. 1 clustadmin clustadmin 2528 Jul 24 02:06 Readme.md
drwxrwxr-x. 2 clustadmin clustadmin   92 Jul 24 02:06 scripts

 

To build an image from a Dockerfile the corresponding Podman command is as follow:

$ sudo podman build -t dbi_mssql_linux:2019-CTP3.1 .
…
--> 5db120fba51f3adc7482ec7a9fed5cc4194f13e97b855d9439a1386096797c39
STEP 65: FROM 5db120fba51f3adc7482ec7a9fed5cc4194f13e97b855d9439a1386096797c39
STEP 66: EXPOSE ${MSSQL_TCP_PORT}
--> 8b5e8234af47adb26f80d64abe46715637bd48290b4a6d7711ddf55c393cd5a8
STEP 67: FROM 8b5e8234af47adb26f80d64abe46715637bd48290b4a6d7711ddf55c393cd5a8
STEP 68: ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
--> 11045806b8af7cf2f67e5a279692e6c9e25212105bcd104ed17b235cdaea97fe
STEP 69: FROM 11045806b8af7cf2f67e5a279692e6c9e25212105bcd104ed17b235cdaea97fe
STEP 70: CMD ["tail -f /dev/null"]
--> bcb8c26d503010eb3e5d72da4b8065aa76aff5d35fac4d7958324ac3d97d5489
STEP 71: FROM bcb8c26d503010eb3e5d72da4b8065aa76aff5d35fac4d7958324ac3d97d5489
STEP 72: HEALTHCHECK --interval=15s CMD [ "/usr/local/bin/healthcheck.sh" ]
--> e7eedf0576f73c95b19adf51c49459b00449da497cf7ae417e597dd39a9e4c8f
STEP 73: COMMIT dbi_mssql_linux:2019-CTP3.1

 

The image built is now available in the local repository:

$ sudo podman images
REPOSITORY                            TAG           IMAGE ID       CREATED         SIZE
localhost/dbi_mssql_linux             2019-CTP3.1   e7eedf0576f7   2 minutes ago   1.79 GB
mcr.microsoft.com/mssql/rhel/server   2019-CTP3.1   dbba412361d7   3 weeks ago     1.79 GB

 

The next step will consist in spinning up a SQL Server container based on this new image. Note that I used a custom parameter DMK=Y to drive the creation of the DMK maintenance tool in our case which including the deployment of a custom dbi_tools database ans related objects that carry out the database maintenance.

$ sudo podman run -d -e 'ACCEPT_EULA=Y' \
> -e 'MSSQL_SA_PASSWORD=Password1' -e 'DMK=Y'  \
> --name 'sqltest2' \
> -p 1470:1433 \
> localhost/dbi_mssql_linux:2019-CTP3.1
d057e0ca41f08a948de4206e9aa07b53450c2830590f2429e50458681d230f6b

 

Let’s check if the dbi_tools has been created during the container runtime phase:

$ sudo podman exec -ti d057 /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P Password1 -Q"SELECT name from sys.databases"
name
--------------------------------------------------------------------------------------------------------------------------------
master
tempdb
model
msdb
dbi_tools

 

Finally, to make the transition with a future blog post, the Podman tool comes with extra commands (under development) that is not available with Docker CLI. The following example generates a YAML deployment file and the corresponding service from an existing container. Please note however that containers with volumes are not supported yet.

The container definition is a follows:

$ sudo podman run -d -e 'ACCEPT_EULA=Y' -e \
'MSSQL_SA_PASSWORD=Password1'  \
--name 'sqltestwithnovolumes' \
-p 1480:1433 \
mcr.microsoft.com/mssql/rhel/server:2019-CTP3.1
7e99581eaec4c91d7c13af4525bfb3805d5b56e675fdb53d0061c231294cd442

 

And we get the corresponding YAML file generated by the Podman command:

$ sudo podman generate kube -s 7e99
# Generation of Kubernetes YAML is still under development!
#
# Save the output of this file and use kubectl create -f to import
# it into Kubernetes.
#
# Created with podman-1.0.2-dev
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: 2019-07-24T03:52:18Z
  labels:
    app: sqltestwithnovolumes
  name: sqltestwithnovolumes
spec:
  containers:
  - command:
    - /opt/mssql/bin/sqlservr
    env:
    - name: PATH
      value: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
    - name: TERM
      value: xterm
    - name: HOSTNAME
    - name: container
      value: oci
    - name: ACCEPT_EULA
      value: "Y"
    - name: MSSQL_SA_PASSWORD
      value: Password1
    image: mcr.microsoft.com/mssql/rhel/server:2019-CTP3.1
    name: sqltestwithnovolumes
    ports:
    - containerPort: 1433
      hostPort: 1480
      protocol: TCP
    resources: {}
    securityContext:
      allowPrivilegeEscalation: true
      capabilities: {}
      privileged: false
      readOnlyRootFilesystem: false
    workingDir: /
status: {}
---
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2019-07-24T03:52:18Z
  labels:
    app: sqltestwithnovolumes
  name: sqltestwithnovolumes
spec:
  ports:
  - name: "1433"
    nodePort: 30309
    port: 1433
    protocol: TCP
    targetPort: 0
  selector:
    app: sqltestwithnovolumes
  type: NodePort
status:
  loadBalancer: {}

 

By default the service type NodePort has been created by the command. This latest command needs further testing for sure!

See you

Cet article Deploying SQL Server 2019 container on RHEL 8 with podman est apparu en premier sur Blog dbi services.

SQL Server 2019 availability group R/W connection redirection, routing mesh and load balancing

Tue, 2019-07-23 10:59

SQL Server 2019 availability group feature will provide secondary to primary replica read/write connection redirection. I wrote about it in a previous blog post here. It consists in redirecting client application connections to the primary replica regardless of the target server specified in the connections string. That’s pretty interesting in some scenarios as read scale-out or specific multi-subnet configurations where creating the traditional AG listener is not an viable option.

The new R/W connection redirection capability does the job but the one-million-dollar question here is what’s happen if one of the replicas specified in my connection string becomes suddenly unavailable? Referring  to the BOL  the connection will fail regardless the role that the replica on the target server plays but we can mitigate the issue by introducing the failover partner parameter in the connection string. As a reminder, the Failover Partner keyword in the connection string works in a database mirror setup and prevent prolonged application downtime. But from my point of view, we could go likely another way and get benefit to all the power of this new availability group feature by introducing a load balancer on the top of this topology as we could do with Docker Swarm or K8s architectures. Indeed, if we take a look more closely, this new provided mechanism by SQL Server 2019 is pretty similar to the routing mesh capabilities of container orchestrators with the same advantages and weaknesses as well. I wrote a blog post about Docker Swarm architectures where we need to implement a proxy to load balance the traffic to avoid getting stuck with the routing mesh capability where a node get unhealthy.

I just applied the same kind of configuration by using an HA Proxy (but you can use your own obviously) with my availability group topology and the behavior was basically the same. Here the intended behavior:

 

Here the configuration of my HAProxy including my 3 AG replicas (WIN20191, WIN20192, WIN20193) and a round robin algorithm at the bottom:

…
backend rserve_backend
    mode tcp
    option tcplog
    option log-health-checks
    option redispatch
    log global
    balance roundrobin
    timeout connect 10s
    timeout server 1m
    server WIN20191 192.168.40.205:1433 check
    server WIN20192 192.168.40.206:1433 check
    server WIN20193 192.168.40.207:1433 check

 

Let’s do a try with connections directly to my HAProxy that is listen on port 81 in my test scenario. Note that for this first test I will connect to the master database to force the local connection getting stick to each replica rather than going through the R/W redirection process. The goal is to check if the round-robin algorithm come into play …

$connectionString = "Server=192.168.40.14,81;uid=sa; pwd=xxxx;Integrated Security=False;Initial Catalog=master;pooling=false”

$connection = New-Object System.Data.SqlClient.SqlConnection
$connection.ConnectionString = $connectionString
$connection.Open()

$sqlCommandText="SELECT 'Current server : ' + @@SERVERNAME AS server_name"
$sqlCommand = New-Object system.Data.sqlclient.SqlCommand($sqlCommandText,$connection)
$sqlCommand.ExecuteScalar()

$connection.Close()
$connection.Dispose()

 

… and that’s the case as show below:

Test connexion initial server nb : 0 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 1 - 192.168.40.14,81 - Current server : WIN20192
Test connexion initial server nb : 2 - 192.168.40.14,81 - Current server : WIN20193
Test connexion initial server nb : 3 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 4 - 192.168.40.14,81 - Current server : WIN20192
Test connexion initial server nb : 5 - 192.168.40.14,81 - Current server : WIN20193

 

Let’s do a try by forcing the R/W redirection now. This time I set up the correct target database name named dummy2 for my availability group AG2019.

$connectionString = "Server=192.168.40.14,81;uid=sa; pwd=xxxx;Integrated Security=False;Initial Catalog=dummy2;pooling=false”

$connection = New-Object System.Data.SqlClient.SqlConnection
$connection.ConnectionString = $connectionString
$connection.Open()

$sqlCommandText="SELECT 'Current server : ' + @@SERVERNAME AS server_name"
$sqlCommand = New-Object system.Data.sqlclient.SqlCommand($sqlCommandText,$connection)
$sqlCommand.ExecuteScalar()

$connection.Close()
$connection.Dispose()

Test connexion initial server nb : 0 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 1 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 2 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 3 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 4 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 5 - 192.168.40.14,81 - Current server : WIN20191

 

This time the R/W redirection is taking effect and each established connection is redirected to my primary replica – WIN20191 this time.

Finally, let’s simulate an outage of one of my replicas, let’s say the WIN20193 replica with a turn off operation and let’s see what’s happen below:

Test connexion initial server nb : 32 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 33 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 34 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 35 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 36 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 37 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 38 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 39 - 192.168.40.14,81 - Current server : WIN20191
Test connexion initial server nb : 40 - 192.168.40.14,81 - Current server : WIN20191

 

Well, from a connection perspective nothing has changed and the HAProxy continues to load balance connections between the remaining healthy replicas. The R/W connection redirection mechanism still continue to come into play as well. A quick look at the HAProxy indicates the WIN20193 replica got unhealthy and the HAProxy has evicted this replica from the game.

[WARNING] 203/063813 (1772) : Health check for server rserve_backend/WIN20193 failed, reason: Layer4 timeout, check durati               on: 2001ms, status: 2/3 UP.
[WARNING] 203/063818 (1772) : Health check for server rserve_backend/WIN20193 failed, reason: Layer4 timeout, check durati               on: 2002ms, status: 1/3 UP.
[WARNING] 203/063822 (1772) : Health check for server rserve_backend/WIN20193 failed, reason: Layer4 timeout, check durati               on: 2001ms, status: 0/2 DOWN.
[WARNING] 203/063822 (1772) : Server rserve_backend/WIN20193 is DOWN. 2 active and 0 backup servers left. 2 sessions activ               e, 0 requeued, 0 remaining in queue.
[WARNING] 203/063848 (1772) : Health check for server rserve_backend/WIN20193 failed, reason: Layer4 connection problem, i               nfo: "No route to host", check duration: 4ms, status: 0/2 DOWN.

 

The new R/W redirection capability provided by Microsoft will extend possible scenarios with availability groups for sure. With previous versions of SQL Server, using a load balancer was limited to R/O workloads but SQL Server 2019 will probably change the game on this topic. Let’s see what’s happen in the future!

 

 

 

 

 

Cet article SQL Server 2019 availability group R/W connection redirection, routing mesh and load balancing est apparu en premier sur Blog dbi services.

ORA-01000 and agent13c

Tue, 2019-07-23 03:51

Recently I received errors messages from OEM13c saying too many cursors were opened in a database:

instance_throughput:ORA-01000: maximum open cursors exceeded

My database had currently this open_cursors value:

SQL> show parameter open_cursors

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_cursors                         integer     300

I decided to increase its value to 800:

SQL> alter system set open_cursors=800;

System altered.

But a few minutes later I received again the same message. I decided to have a more precise look to discover what’s was happening.

SQL> SELECT  max(a.value) as highest_open_cur, p.value as max_open_cur FROM v$sesstat a, v$statname b, v$parameter p WHERE  a.statistic# = b.statistic#  and b.name = 'opened cursors current' and p.name= 'open_cursors' group by p.value;

HIGHEST_OPEN_CUR     MAX_OPEN_CUR
   300                 800

So I need to  find out which session is causing the error:

SQL> select a.value, s.username, s.sid, s.serial# from v$sesstat a, v$statname b, v$session s where a.statistic# = b.statistic#  and s.sid=a.sid and b.name = 'opened cursors current' and s.username is not null;

     VALUE USERNAME                              SID    SERIAL#
---------- ------------------------------ ---------- ----------
         9 SYS                                     6      36943
         1 SYS                                   108      31137
         1 SYS                                   312      15397
       300 SYS                                   417      31049
        11 SYS                                   519      18527
         7 SYS                                   619      48609
         1 SYS                                   721      51139
         0 PUBLIC                                922         37
        17 SYS                                  1024          1
        14 SYS                                  1027      25319
         1 SYS                                  1129      40925

A sys connection is using 300 cursors :=(( let’s see what it is:

Great the agent 13c is causing the problem :=((

I already encountered this kind of problem on another client’s site. In fact the agent 13c is using metrics to determine the High Availability Disk or Media Backup every 15 minutes and is using a lot of cursors. The best way is to disable those metrics to avoid ORA-01000 errors:

After reloading the agent and reevaluating the alert, the incident disappeared successfully :=)

Cet article ORA-01000 and agent13c est apparu en premier sur Blog dbi services.

Documentum – D2+Pack Plugins not installed correctly

Sun, 2019-07-21 08:43

In a previous blog, I explained how D2 can be installed in silent. In this blog, I will talk about a possible issue that might happen when doing so with the D2+Pack Plugins that aren’t being installed, even if you ask D2 to install them and while there is no message or no errors related to this issue. The first time I had this issue, it was several years ago but I never blogged about it. I faced it again recently so I thought I would this time.

So first, let’s prepare the D2 and D2+Pack packages for the silent installation. I will take the D2_template.xml file from my previous blog as a starting point for the silent parameter file:

[dmadmin@cs_01 ~]$ cd $DOCUMENTUM/D2-Install/
[dmadmin@cs_01 D2-Install]$ ls *.zip *.tar.gz
-rw-r-----. 1 dmadmin dmadmin 491128907 Jun 16 08:12 D2_4.7.0_P25.zip
-rw-r-----. 1 dmadmin dmadmin  61035679 Jun 16 08:12 D2_pluspack_4.7.0.P25.zip
-rw-r-----. 1 dmadmin dmadmin 122461951 Jun 16 08:12 emc-dfs-sdk-7.3.tar.gz
[dmadmin@cs_01 D2-Install]$
[dmadmin@cs_01 D2-Install]$ unzip $DOCUMENTUM/D2-Install/D2_4.7.0_P25.zip -d $DOCUMENTUM/D2-Install/
[dmadmin@cs_01 D2-Install]$ unzip $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25.zip -d $DOCUMENTUM/D2-Install/
[dmadmin@cs_01 D2-Install]$ unzip $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/C2-Dar-Install.zip -d $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/
[dmadmin@cs_01 D2-Install]$ unzip $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/D2-Bin-Dar-Install.zip -d $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/
[dmadmin@cs_01 D2-Install]$ unzip $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/O2-Dar-Install.zip -d $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/
[dmadmin@cs_01 D2-Install]$ tar -xzvf $DOCUMENTUM/D2-Install/emc-dfs-sdk-7.3.tar.gz -C $DOCUMENTUM/D2-Install/
[dmadmin@cs_01 D2-Install]$
[dmadmin@cs_01 D2-Install]$ #See the previous blog for the content of the "/tmp/dctm_install/D2_template.xml" file
[dmadmin@cs_01 D2-Install]$ export d2_install_file=$DOCUMENTUM/D2-Install/D2.xml
[dmadmin@cs_01 D2-Install]$ cp /tmp/dctm_install/D2_template.xml ${d2_install_file}
[dmadmin@cs_01 D2-Install]$
[dmadmin@cs_01 D2-Install]$ sed -i "s,###WAR_REQUIRED###,true," ${d2_install_file}
[dmadmin@cs_01 D2-Install]$ sed -i "s,###BPM_REQUIRED###,true," ${d2_install_file}
[dmadmin@cs_01 D2-Install]$ sed -i "s,###DAR_REQUIRED###,true," ${d2_install_file}
[dmadmin@cs_01 D2-Install]$
[dmadmin@cs_01 D2-Install]$ sed -i "s,###DOCUMENTUM###,$DOCUMENTUM," ${d2_install_file}
[dmadmin@cs_01 D2-Install]$
[dmadmin@cs_01 D2-Install]$ sed -i "s,###PLUGIN_LIST###,$DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/C2-Install-4.7.0.jar;$DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/D2-Bin-Install-4.7.0.jar;$DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/O2-Install-4.7.0.jar;," ${d2_install_file}
[dmadmin@cs_01 D2-Install]$
[dmadmin@cs_01 D2-Install]$ sed -i "s,###JMS_HOME###,$DOCUMENTUM_SHARED/wildfly9.0.1," ${d2_install_file}
[dmadmin@cs_01 D2-Install]$
[dmadmin@cs_01 D2-Install]$ sed -i "s,###DFS_SDK_PACKAGE###,emc-dfs-sdk-7.3," ${d2_install_file}
[dmadmin@cs_01 D2-Install]$
[dmadmin@cs_01 D2-Install]$ read -s -p "  ----> Please enter the Install Owner's password: " dm_pw; echo; echo
  ----> Please enter the Install Owner's password: <TYPE HERE THE PASSWORD>
[dmadmin@cs_01 D2-Install]$ sed -i "s,###INSTALL_OWNER###,dmadmin," ${d2_install_file}
[dmadmin@cs_01 D2-Install]$ sed -i "s,###INSTALL_OWNER_PASSWD###,${dm_pw}," ${d2_install_file}
[dmadmin@cs_01 D2-Install]$
[dmadmin@cs_01 D2-Install]$ sed -i "s/###DOCBASE_LIST###/Docbase1/" ${d2_install_file}
[dmadmin@cs_01 D2-Install]$

 

Now that the silent file is ready and that all source packages are available, we can start the D2 Installation with the command below. Please note the usage of the tracing/debugging options as well as the usage of the “-Djava.io.tmpdir” Java option to ask D2 to put all tmp files in a specific directory, with this, D2 is supposed to trace/debug everything and use my specific temporary folder:

[dmadmin@cs_01 D2-Install]$ java -DTRACE=true -DDEBUG=true -Djava.io.tmpdir=$DOCUMENTUM/D2-Install/tmp -jar $DOCUMENTUM/D2-Install/D2_4.7.0_P25/D2-Installer-4.7.0.jar ${d2_install_file}

 

The D2 Installer printed the following extract:

...
Installing plugin: $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/C2-Install-4.7.0.jar
Plugin install command: [java, -jar, $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/C2-Install-4.7.0.jar, $DOCUMENTUM/D2-Install/tmp/D2_4.7.0/scripts/C6-Plugins-Install_new.xml]
Line read: [ Starting automated installation ]
Installing plugin: $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/D2-Bin-Install-4.7.0.jar
Plugin install command: [java, -jar, $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/D2-Bin-Install-4.7.0.jar, $DOCUMENTUM/D2-Install/tmp/D2_4.7.0/scripts/C6-Plugins-Install_new.xml]
Line read: [ Starting automated installation ]
Installing plugin: $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/O2-Install-4.7.0.jar
Plugin install command: [java, -jar, $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/O2-Install-4.7.0.jar, $DOCUMENTUM/D2-Install/tmp/D2_4.7.0/scripts/C6-Plugins-Install_new.xml]
Line read: [ Starting automated installation ]
Installing plugin: $DOCUMENTUM/D2-Install/tmp/D2_4.7.0/plugin/D2-Widget-Install.jar
...
...
Current line: #################################
Current line: #           Plugins               #
Current line: #################################
Current line: #plugin_1=../C2/C2-Plugin.jar
Updating line with 'plugin_'.
Updating plugin 1 with plugin name: D2-Widget-Plugin.jar and config exclude value of: false
Updating plugin 2 with plugin name: D2-Specifications-Plugin.jar and config exclude value of: false
Current line: #plugin_2=../O2/O2-Plugin.jar
Current line: #plugin_3=../P2/P2-Plugin.jar
...

 

As you can see, there are no errors so if you aren’t paying attention, you might think that the D2+Pack is properly installed. It’s not. At the end of the extract I put above, you can see that the D2 Installer is updating the plugins list with some elements (D2-Widget-Plugin.jar & D2-Specifications-Plugin.jar). If there were no issue, the D2+Pack Plugins would have been added in this section as well, which isn’t the case.

You can check all temporary files, all log files, it will not be printed anywhere that there were an issue while installing the D2+Pack Plugins. In fact, there are 3 things missing:

  • The DARs of the D2+Pack Plugins weren’t installed
  • The libraries of the D2+Pack Plugins weren’t deployed into the JMS
  • The libraries of the D2+Pack Plugins weren’t packaged in the WAR files

There is a way to quickly check if the D2+Pack Plugins DARs have been installed, just look inside the docbase config folder, there should be one log file for the D2 Core DARs as well as one log file for each of the D2+Pack Plugins. So that’s what you should get:

[dmadmin@cs_01 D2-Install]$ cd $DOCUMENTUM/dba/config/Docbase1/
[dmadmin@cs_01 Docbase1]$ ls -ltr *.log
-rw-r-----. 1 dmadmin dmadmin  62787 Jun 16 08:18 D2_CORE_DAR.log
-rw-r-----. 1 dmadmin dmadmin   4794 Jun 16 08:20 D2-C2_dar.log
-rw-r-----. 1 dmadmin dmadmin   3105 Jun 16 08:22 D2-Bin_dar.log
-rw-r-----. 1 dmadmin dmadmin   2262 Jun 16 08:24 D2-O2_DAR.log
[dmadmin@cs_01 Docbase1]$

 

If you only have “D2_CORE_DAR.log”, then you are potentially facing this issue. You could also check the “csDir” folder that you put in the D2 silent parameter file: if this folder doesn’t contain “O2-API.jar” or “C2-API.jar” or “D2-Bin-API.jar”, then you have the issue as well. Obviously, you could also check the list of installed DARs in the repository…

So what’s the issue? Well, you remember above when I mentioned the “-Djava.io.tmpdir” Java option to specifically ask D2 to put all temporary files under a certain location? The D2 Installer, for the D2 part, is using this option without issue… But for the D2+Pack installation, there is actually a hardcoded path for the temporary files which is /tmp. Therefore, it will ignore this Java option and will try instead to execute the installation under /tmp.

This is the issue I faced a few times already and it’s the one I wanted to talk about in this blog. For security reasons, you might have to deal from time to time with specific mount options on file systems. In this case, the “noexec” option was set on the /tmp mount point and therefore D2 wasn’t able to execute commands under /tmp and instead of printing an error, it just bypassed silently the installation. I had a SR opened with the Documentum Support (when it was still EMC) to see if it was possible to use the Java option and not /tmp but it looks like it’s still not solved since I had the exact same issue with the D2 4.7 P25 which was released very recently.

Since there is apparently no way to specify which temporary folder should be used for the D2+Pack Plugins, you should either perform the installation manually (DAR installation + libraries in JMS & WAR files) or remove the “noexec” option on the file system for the time of the installation:

[dmadmin@cs_01 Docbase1]$ mount | grep " /tmp"
/dev/mapper/VolGroup00-LogVol06 on /tmp type ext4 (rw,noexec,nosuid,nodev)
[dmadmin@cs_01 Docbase1]$
[dmadmin@cs_01 Docbase1]$ sudo mount -o remount,exec /tmp
[dmadmin@cs_01 Docbase1]$ mount | grep " /tmp"
/dev/mapper/VolGroup00-LogVol06 on /tmp type ext4 (rw,nosuid,nodev)
[dmadmin@cs_01 Docbase1]$
[dmadmin@cs_01 Docbase1]$ #Execute the D2 Installer here
[dmadmin@cs_01 Docbase1]$
[dmadmin@cs_01 Docbase1]$ sudo mount -o remount /tmp
[dmadmin@cs_01 Docbase1]$ mount | grep " /tmp"
/dev/mapper/VolGroup00-LogVol06 on /tmp type ext4 (rw,noexec,nosuid,nodev)

 

With the workaround in place, the D2 Installer should now print the following (same extract as above):

...
Installing plugin: $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/C2-Install-4.7.0.jar
Plugin install command: [java, -jar, $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/C2-Install-4.7.0.jar, $DOCUMENTUM/D2-Install/tmp/D2_4.7.0/scripts/C6-Plugins-Install_new.xml]
Line read: [ Starting automated installation ]
Line read: Current MAC address : [ Starting to unpack ]
Line read: [ Processing package: core (1/2) ]
Line read: [ Processing package: DAR (2/2) ]
Line read: [ Unpacking finished ]
Line read: [ Writing the uninstaller data ... ]
Line read: [ Automated installation done ]
Installing plugin: $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/D2-Bin-Install-4.7.0.jar
Plugin install command: [java, -jar, $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/D2-Bin-Install-4.7.0.jar, $DOCUMENTUM/D2-Install/tmp/D2_4.7.0/scripts/C6-Plugins-Install_new.xml]
Line read: [ Starting automated installation ]
Line read: Current MAC address : [ Starting to unpack ]
Line read: [ Processing package: core (1/2) ]
Line read: [ Processing package: DAR (2/2) ]
Line read: [ Unpacking finished ]
Line read: [ Writing the uninstaller data ... ]
Line read: [ Automated installation done ]
Installing plugin: $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/O2-Install-4.7.0.jar
Plugin install command: [java, -jar, $DOCUMENTUM/D2-Install/D2_pluspack_4.7.0.P25/Plugins/O2-Install-4.7.0.jar, $DOCUMENTUM/D2-Install/tmp/D2_4.7.0/scripts/C6-Plugins-Install_new.xml]
Line read: [ Starting automated installation ]
Line read: Current MAC address : [ Starting to unpack ]
Line read: [ Processing package: core (1/2) ]
Line read: [ Processing package: DAR (2/2) ]
Line read: [ Unpacking finished ]
Line read: [ Writing the uninstaller data ... ]
Line read: [ Automated installation done ]
Installing plugin: $DOCUMENTUM/D2-Install/tmp/D2_4.7.0/plugin/D2-Widget-Install.jar
...
...
Current line: #################################
Current line: #           Plugins               #
Current line: #################################
Current line: #plugin_1=../C2/C2-Plugin.jar
Updating line with 'plugin_'.
Updating plugin 1 with plugin name: D2-Widget-Plugin.jar and config exclude value of: false
Updating plugin 2 with plugin name: C2-Plugin.jar and config exclude value of: false
Updating plugin 3 with plugin name: O2-Plugin.jar and config exclude value of: false
Updating plugin 4 with plugin name: D2-Specifications-Plugin.jar and config exclude value of: false
Updating plugin 5 with plugin name: D2-Bin-Plugin.jar and config exclude value of: false
Current line: #plugin_2=../O2/O2-Plugin.jar
Current line: #plugin_3=../P2/P2-Plugin.jar
...

 

As you can see above, the output is quite different: it means that the D2+Pack Plugins have been installed.

 

Cet article Documentum – D2+Pack Plugins not installed correctly est apparu en premier sur Blog dbi services.

Schedule reboots of your AWS instances and how that can result in a hard reboot and corruption

Wed, 2019-07-17 02:50

From time to time you might require to reboot your AWS instances. Maye you applied some patches or for whatever reason. Rebooting an AWS instance can be done in several ways: You can of course do that directly from the AWS console. You can use the AWS command line utilities as well. If you want to schedule a reboot you can either do that using CloudWatch or you can use SSM Maintenance Windows for that. In this post we will only look at CloudWatch and System Manager as these two can be used to schedule the reboot easily using AWS native utilities. You could, of course, do that as well by using cron and the AWS command line utilities but this is not the scope of this post.

For CloudWatch the procedure for rebooting instances is the following: Create a new rule:

Go for “Schedule” and give a cron expression. In this case it means: 16-July-2019 at 07:45. Select the “EC2 RebootInstances API call” and provide the instance IDs you want to have rebooted. There is one limitation: You can only add up to five targets. If you need more then you have to use System Manager as described later in this post. You should pre-create an IAM role with sufficient permissions which you can use for this as otherwise a new one will be created each time.

Finally give a name and a description, that’s it:


Once time reaches your cron expression target the instance(s) will reboot.

The other solution for scheduling stuff against many instances is to use AWS SSM. It requires a bit more preparation work but in the end this is the solution we decided to go for as more instances can be scheduled with one maintenance window (up to 50) and you could combine several tasks, e.g. executing something before doing the reboot and do something else after the reboot.

The first step is to create a new maintenance window:

Of course it needs a name and an optional description:

Again, in this example, we use a cron expression for the scheduling (some as above in the CloudWatch example). Be aware that this is UTC time:

Once the maintenance window is created we need to attach a task to it. Until now we only specified a time to run something but we did not specify what to run. Attaching a task can be done in the task section of the maintenance window:

In this case we go for an “Automation task”. Name and description are not required:

The important part is the document to run, in our case it is “AWS-RestartEC2Instance”:

Choose the instances you want to run the document against:

And finally specify the concurrency and error count and again, an IAM role with sufficient permissions to perform the actions defined in the document:

Last, but not least, specify a pseudo parameter called “{TARGET_ID}” which will tell AWS SSM to run that against all the instances you selected in the upper part of the screen:

That’s it. Your instances will be rebooted at the time you specified in the cron expression. All fine and easy and you never have to worry about scheduled instance reboots. Just adjust the cron expression and maybe the list of instances and you are done for the next scheduled reboot. Really? We did it like that against 100 instances and we got a real surprise. What happened? Not many, but a few instances have been rebooted hard and one of them even needed to be restored afterwards. Why that? This never happened in the tests we did before. When an instance does not reboot within 4 minutes AWS performs a hard reboot. This can lead to corruption as stated here. When you have busy instances at the time of the reboot this is not what you want. On Windows you get something like this:

You can easily reproduce that by putting a Windows system under heavy load with a cpu stress test and then schedule a reboot as described above.

In the background the automation document calls aws:changeInstanceState and that comes with a force parameter:

… and here we have it again: Risk of corruption. When you take a closer look at the automation document that stops an EC2 instance you can see that as well:

So what is the conclusion of all this? It is not to blame AWS for anything, all is documented and works as documented. Testing in a test environment does not necessarily mean it works on production as well. Even if it is documented you might not expect it because your tests went fine and you missed that part of the documentation where the behavior is explained. AWS System Manager still is a great tool for automating tasks but you really need to understand what happens before implementing it in production. And finally: Working on public clouds make many things easier but others harder to understand and troubleshoot.

Cet article Schedule reboots of your AWS instances and how that can result in a hard reboot and corruption est apparu en premier sur Blog dbi services.

Email Spoofing

Mon, 2019-07-15 11:07

Have you ever had this unhealthy sensation of being accused of facts that do not concern you? To feel helpless in the face of an accusing mail, which, because of its imperative and accusing tone, has the gift of throwing us the opprobrium?

This is the purpose of this particular kind of sextortion mail that uses spoofing, to try to extort money from you. A message from a supposed “hacker” who claims to have hacked into your computer. He threatens you with publishing compromising images taken without your knowledge with your webcam and asks you for a ransom in virtual currency most of the time.

Something like that:

 

Date:  Friday, 24 May 2019 at 09:19 UTC+1
Subject: oneperson
Your account is hacked! Renew the pswd immediately!
You do not heard about me and you are definitely wondering why you’re receiving this particular electronic message, proper?
I’m ahacker who exploitedyour emailand digital devicesnot so long ago.
Do not waste your time and make an attempt to communicate with me or find me, it’s not possible, because I directed you a letter from YOUR own account that I’ve hacked.
I have started malware to the adult vids (porn) site and suppose that you watched this website to enjoy it (you understand what I mean).
Whilst you have been keeping an eye on films, your browser started out functioning like a RDP (Remote Control) that have a keylogger that gave me authority to access your desktop and camera.
Then, my softaquiredall data.
You have entered passcodes on the online resources you visited, I intercepted all of them.
Of course, you could possibly modify them, or perhaps already modified them.
But it really doesn’t matter, my app updates needed data regularly.
And what did I do?
I generated a reserve copy of every your system. Of all files and personal contacts.
I have managed to create dual-screen record. The 1 screen displays the clip that you were watching (you have a good taste, ha-ha…), and the second part reveals the recording from your own webcam.
What exactly must you do?
So, in my view, 1000 USD will be a reasonable amount of money for this little riddle. You will make the payment by bitcoins (if you don’t understand this, search “how to purchase bitcoin” in Google).
My bitcoin wallet address:
1816WoXDtSmAM9a4e3HhebDXP7DLkuaYAd
(It is cAsE sensitive, so copy and paste it).
Warning:
You will have 2 days to perform the payment. (I built in an exclusive pixel in this message, and at this time I understand that you’ve read through this email).
To monitorthe reading of a letterand the actionsin it, I utilizea Facebook pixel. Thanks to them. (Everything thatis usedfor the authorities may helpus.)

In the event I do not get bitcoins, I shall undoubtedly give your video to each of your contacts, along with family members, colleagues, etc?

 

Users who are victims of these scams receive a message from a stranger who presents himself as a hacker. This alleged “hacker” claims to have taken control of his victim’s computer following consultation of a pornographic site (or any other site that morality would condemn). The cybercriminal then announces having compromising videos of the victim made with his webcam. He threatens to publish them to the victim’s personal or even professional contacts if the victim does not pay him a ransom. This ransom, which ranges from a few hundred to several thousand dollars, is claimed in a virtual currency (usually in Bitcoin but not only).

To scare the victim even more, cybercriminals sometimes go so far as to write to the victim with his or her own email address, in order to make him or her believe that they have actually taken control of his or her account. 

First of all, there is no need to be afraid of it. Indeed, if the “piracy” announced by cybercriminals is not in theory impossible to achieve, in practice, it remains technically complex and above all time-consuming to implement. Since scammers target their victims by the thousands, it can be deduced that they would not have the time to do what they claim to have done. 

These messages are just an attempt at a scam. In other words, if you receive such a blackmail message and do not pay, nothing more will obviously happen. 

Then, no need to change your email credentials. Your email address is usually something known and already circulates on the Internet because you use it regularly on different sites to identify and communicate. These sites have sometimes resold or exchanged their address files with different partners more or less scrupulous in marketing objectives.

If cybercriminals have finally written to you with your own email address to make you believe that they have taken control of it: be aware that the sender’s address in a message is just a simple display that can very easily be usurped without having to have a lot of technical skills. 

In any case, the way to go is simple: don’t panic, don’t answer, don’t pay, just throw this mail in the trash (and don’t forget to empty it regularly). 

On the mail server side, setting up certain elements can help to prevent this kind of mail from spreading in the organization. This involves deploying the following measures on your mail server:

  •       SPF (Sender Policy Framework): This is a standard for verifying the domain name of the sender of an email (standardized in RFC 7208 [1]). The adoption of this standard is likely to reduce spam. It is based on the SMTP (Simple Mail Transfer Protocol) which does not provide a sender verification mechanism. SPF aims to reduce the possibility of spoofing by publishing a record in the DNS (Domain Name Server) indicating which IP addresses are allowed or forbidden to send mail for the domain in question.
  •         DKIM (DomainKeys Identified Mail): This is a reliable authentication standard for the domain name of the sender of an email that provides effective protection against spam and phishing (standardized in RFC 6376 [2]). DKIM works by cryptographic signature, verifies the authenticity of the sending domain and also guarantees the integrity of the message.
  •       DMARC (Domain-based Message Authentication, Reporting and Conformance): This is a technical specification to help reduce email misuse by providing a solution for deploying and monitoring authentication issues (standardized in RFC 7489 [3]). DMARC standardizes the way how recipients perform email authentication using SPF and DKIM mechanisms.

 

REFERENCES

[1] S. Kitterman, “Sender Policy Framework (SPF),” ser. RFC7208, 2014, https://tools.ietf.org/html/rfc7208

[2] D. Crocker, T. Hansen, M. Kucherawy, “DomainKeys Identified Mail (DKIM) Signatures” ser. RFC6376, 2011,  https://tools.ietf.org/html/rfc6376

[3] M. Kuchewary, E. Zwicky, “Domain-based Message Authentication, Reporting and Conformance (DMARC)”, ser. RFC7489, 2015, https://tools.ietf.org/html/rfc7489

Cet article Email Spoofing est apparu en premier sur Blog dbi services.

Migrating your users from md5 to scram authentication in PostgreSQL

Thu, 2019-07-11 03:43

One of the new features in PostgreSQL 10 was the introduction of stronger password authentication based on SCRAM-SHA-256. How can you migrate your existing users that currently use md5 authentication to the new method without any interruption? Actually that is quite easy, as you will see in a few moments, but there is one important point to consider: Not every client/driver does already support SCRAM-SHA-256 authentication so you need to check that before. Here is the list of the drivers and their support for SCRAM-SHA-256.

The default method that PostgreSQL uses to encrypt password is defined by the “password_encryption” parameter:

postgres=# show password_encryption;
 password_encryption 
---------------------
 md5
(1 row)

Let’s assume we have a user that was created like this in the past:

postgres=# create user u1 login password 'u1';
CREATE ROLE

With the default method of md5 the hashed password looks like this:

postgres=# select passwd from pg_shadow where usename = 'u1';
               passwd                
-------------------------------------
 md58026a39c502750413402a90d9d8bae3c
(1 row)

As you can see the hash starts with md5 so we now that this hash was generated by the md5 algorithm. When we want this user to use scram-sha-256 instead, what do we need to do? The first step is to change the “password_encryption” parameter:

postgres=# alter system set password_encryption = 'scram-sha-256';
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
postgres=# select current_setting('password_encryption');
 current_setting 
-----------------
 scram-sha-256
(1 row)

From now on the server will use scram-sha-256 and not anymore md5. But what happens when our user wants to connect to the instance once we changed that? Currently this is defined in pg_hba.conf:

postgres=> \! grep u1 $PGDATA/pg_hba.conf
host    postgres        u1              192.168.22.1/24         md5

Even though the default is not md5 anymore the user can still connect to the instance because the password hash did not change for that user:

postgres=> \! grep u1 $PGDATA/pg_hba.conf
host    postgres        u1              192.168.22.1/24         md5

postgres@rhel8pg:/home/postgres/ [PGDEV] psql -h 192.168.22.100 -p 5433 -U u1 postgres
Password for user u1: 
psql (13devel)
Type "help" for help.

postgres=> 

Once the user changed the password:

postgres@rhel8pg:/home/postgres/ [PGDEV] psql -h 192.168.22.100 -p 5433 -U u1 postgres
Password for user u1: 
psql (13devel)
Type "help" for help.

postgres=> \password
Enter new password: 
Enter it again: 
postgres=> 

… the hash of the new password is not md5 but SCRAM-SHA-256:

postgres=# select passwd from pg_shadow where usename = 'u1';
                                                                passwd                               >
----------------------------------------------------------------------------------------------------->
 SCRAM-SHA-256$4096:CypPmOW5/uIu4NvGJa+FNA==$PNGhlmRinbEKaFoPzi7T0hWk0emk18Ip9tv6mYIguAQ=:J9vr5CQDuKE>
(1 row)

One could expect that from now on the user is not able to connect anymore as we did not change pg_hba.conf until now:

postgres@rhel8pg:/home/postgres/ [PGDEV] psql -h 192.168.22.100 -p 5433 -U u1 postgres
Password for user u1: 
psql (13devel)
Type "help" for help.

postgres=> 

But in reality that still works as the server now uses the SCRAM-SHA-256 algorithm. So once all the users changed their passwords you can safely switch the rule in pg_hba.conf and you’re done:

postgres=> \! grep u1 $PGDATA/pg_hba.conf
host    postgres        u1              192.168.22.1/24         scram-sha-256

postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

You just need to make sure that all the users do not have a hash starting with md5 but the new one starting with SCRAM-SHA-256.

Cet article Migrating your users from md5 to scram authentication in PostgreSQL est apparu en premier sur Blog dbi services.

Huawei Dorado 6000 V3 benchmark

Wed, 2019-07-10 02:39

I had the opportunity to test the new Dorada 6000 V3 All-Flash storage system.
See what the all-new Dorado 6000 V3 All-Flash Storage system is capable as storage for your database system.

Before you read

This is a series of different blog posts:
In the first blog post, I talk about “What you should measure on your database storage and why”.
The second blog post will talk about “How to do database storage performance benchmark with FIO”.
The third blog post will show “How good is the new HUAWEI Dorada 6000V3 All-Flash System for databases” measured with the methods and tools from post one and two (aka this one here).

The first two posts give you the theory to understand all the graphics and numbers I will show in the third blog post.

So in this post, we see, what are the results when we test a Huawei Dorado 6000V3 All-Flash storage system with these technics.

I uploaded all the files to a github repository: Huawei-Dorado6000V3-Benchmark.

Foreword

The setup was provided by Huawei in Shengsen, China. I’ve got remote access with a timeout at a certain point. Every test run runs for 10h, because of the timeout I was sometimes not able to capture all performance view pictures. That’s why some of the pictures are missing. Storage array and servers were provided free of charge, there was no exercise of influence from Huawei on the results or conclusion in any way.

Setup

4 Server were. provided, each with 4×16 GBit/s FC adapter direct connected to the storage systems.
There are 256 GByte of memory installed and 2x 14 Cores 2.6 GHz E5-2690 Intel CPUs.
Hyperthreading is disabled.
The 10 GBit/s network interfaces are irrelevant for this test here because all storage. traffic runs over FC.

The Dorado 6000 V3 System has 1 TByte of cache and 50x 900 GByte SSD from Huawei.
Deduplication was disabled.
Tests were made with and without compression.

Theoretical max speed

With 4x16GBit/s a maximal throughput of 64 GBit/s or 8 GByte/s is possible.
In IOPS this means we can transmit 8192 IOPS with 1 MByte block size or 1’048’576 IOPS with 8 KByte block size.
As mentioned in the title, this is theoretically or raw bandwidth, the usable bandwidth or payload is, of course, smaller: A FC-frames is 2112 bytes with 36 bytes of protocol overhead.
So in a 64 GBit/s FC network we can transfer: 64GBit/s / 8 ==> 8GByte/s * 1024 ==> 8192 MByte/s (raw) * (100-(36/2.112))/100 ==> 6795MByte/s (payload).

So we end up with a maximum of 6975 IOPS@1MByte or 869’841 IOPS@8KByte (payload) not included is the effect, that we are using multipathing* with 4x16GBit/s, which will also consume some power.

*If somebody out there has a method to calculate the overhead of multipathing in such a setup, please contact me!

Single-Server Results General

All single server tests were made on devices with enabled data compression. Unfortunately, I do not have the results from my tests with uncompressed devices for single server anymore, but you can see the difference in the multi-server section.

8 KByte block size

The 8 KByte block size tests on a single server were very performant.
What we can already tell, as higher the parallelity as better the storage performs. This is not really a surprise. Most storage systems work better, as higher the parallel access is.
Specialy for 1 thread, we see the differenc between having one disk in a diskgroup and be able to use 3967 IOPS or using e.g. 5 disks and 1 thread an be able to use 16700 IOPS.
The latency for all tests was great with 0.25ms to 0.4ms for reading operation and 0.1 to 0.4ms for write operations.
The 0.1ms for write is not that impressive, because it is mainly the performance of the write cache, but even when we exceeded the write cache we were not higher then 0.4ms

1 MByte block size

On the 1 MByte tests, we see, that we already hit the max speed with 6 devices (parallelity of 6) to 9 devices (parallelity 2).

As an example to interpret the graphic, when you have a look at the green line (6 devices), we reach the peak performance at a parallelity of 6.
For the dark blue line (7 devices) we hit the max peak at parallelity 4 and so on.

If we increase the parallelity over this point, the latency will grow or even the throughput will decrease.
For the 1 MByte tests, we hit a limitation at around 6280 IOPS. This is around 90% of the calculated maximum speed.

So if we go with Oracle ASM, we should bundle at least 5 devices together to a diskgroup.
We also see, that when we run a rebalance diskgroup we should go for a small rebalance power. A value smaller than 4 should be chosen, every value over 8 is counterproductive and will consume all possible I/O on your system and slow all databases on this server.
Monitoring / Verification

To verify the results, I am using dbms_io_calibration on the very same devices as the performance test was running. The expectation is, that we will see more or less the same results.

On large IO the measured 6231 IOPS by IO calibration is almost the same as measured by FIO (+/- 1%).
IO calibration measured 604K IOPS for small IO, which is significantly more than the +/- 340kw IOPS measured with FIO. This is explainable because IO calibration is working with the number of disks for the parallelity and I did this test with 20 disks instead. of 10. Sadly when I realized my mistake, I already had no more access to the system.

In the following pictures you see the performance view of the storage system with the data measured by FIO as an overlay. As we can see, the values for the IOPS matches perfectly.
The value for latency was lower on the storage part, which is explainable with the different points where we are measuring (once on the storage side, once on the server side).
All print screens of the live performance view of the storage can be found in the git repository. The values for Queue depth, throughput, and IOPS matched perfectly with the measured results.


Multi-Server Results with compression General

The tests for compressed and uncompressed devices were made with 3 parallel servers.

8 KByte block size

For random read with 8 KByte blocks, the IOPS increased almost linear from 1 to 3 nodes and we hit a peak of 655’000 IOPS with 10 devices / 10 threads. The answer time was between 0.3 and 0.45 ms.
For random write, we hit some kind of limitation at around 250k IOPS. We could not get a higher value than that which was kind of surprising for me. I would have expected better results here.
From the point, where we hit the maximum number of IOPS we see the same behavior as with 1 MByte blocks: More threads does only increase the answer time but does not get you better performance.
So for random write with 8 KByte blocks, the maximum numbers are around 3 devices and 10 threads or 10 devices and 3 threads or a parallelity of 30.
As long as we stay under this limit we see answer times between 0.15 and 0.5ms, over this limit the answer times can increase <10ms.
1 MByte block size

The multi-server tests show some interesting behavior with large reads on this storage system.
We hit a limitation at around 7500 to 7800 IOPS per second. For sequential write, we could achieve almost double this result with up to 14.5k IOPS.

Of course, I discussed all the results with Huawei to see their view on my tests.
The explanation for the way better performance on write then read was, with write we go straight to the 1 TByte big cache, for reading the system had to scratch everything from disk. This Beta-Firmware version did not have any read cache and that’s why the results were lower. All Firmwares starting from the end of February do have also read cache.
I go with this answer and hope to retest it in the future with the newest firmware, still thinking the 7500 IOPS is a little bit low even without read cache.
Multi-Server Results without compression

Comparing the results for compressed devices to uncompressed devices we see an increase of IOPS up to 30% and a decrease of latency at the same level for 8 KByte block size.
For 1 MByte sequential read, the difference was smaller with around 10%, for 1 MByte sequential write we could gain an increase of around 15-20%.

Multi-Server Results with high parallelity General

Because the tests with 3 servers did no max out the storage on the 8 KByte block size, I decided to do a max test with 4 parallel servers and with a parallelity from 1-100 instead of 1-10.
The steps were 1,5,10,15,20,30,40,50,75 and 100.
These tests were only performed on uncompressed devices.

8 KByte block size

It took 15 threads (per server) with 10 devices: 60 processes in total to reach the peak performance of the Dorado 6000V3 systems.
At this point, we reached 8 KByte random read 940k IOPS @0.637 ms. Remembering the answer, that this Firmware version does not have any read cache, this performance is achieved completely from the SSDs and could theoretically be even better with enabled read cache
If we increase the parallelity further, we see the same effect as with 1 MByte blocks: the answer time is increasing (dramatically) and the throughput is decreasing.

Depending on the number of parallel devices, we need between 60 parallel processes (with 10 devices) up to 300 parallel processes (with 3 parallel devices).

1 MByte block size

For the large IOs, we see the same picture as with 1 or 3 servers. A combined parallelity of 20-30 can max out the storage systems. So be very careful with your large IO tasks not to affect the other operations on the storage system.

Mixed Workload

After these tests, we know, the upper limit for this storage in single case tests. In a normal workload, we will never see only one kind of IO: There will always be a mixture of 8 KByte read & write IOPS side by side with 1 MByte IO. To simulate this picture, we create two FIO files. One creates approx: 40k-50k IOPS with random read and random write in a 50/50 split.
This will be our baseline, then we add approx. 1000 1 MByte IOPS every 60 seconds and see how the answer time reacts.


As seen in this picture from the performance monitor of the storage system the 1 MByte IOPS blocks had two effects on the smaller IOPS
The throughput of the small IOPS is decreasing
The latency is increasing.
In the middle of the test, we stop the small IOPS to see the latency of just the 1 MByte IOPS.

Both effects are expected and within the expected parameters: Test passed.

So with a base workload of 40k-50k IOPS, we can run e.g. backups in parallel with a bandwidth up to 5.5 GByte/s without interfering with the database work or we can do up to 5 active duplicates on the same storage without interfering with the other databases.

Summary

This storage system showed a fantastic performance at 8 KByte block size with very low latency. Especially the high number of parallel processes we can run against it before we hit the peak performance makes it a good choice to serving a large number of Oracle databases on it.

The large IO (1 MByte) performance for write operations was good but not that good compared with the excellent 8 KByte performance. The sequential read part is missing the read cache badly compared to the performance which is possible for writing. But even that is not on top of the line compared to other storage systems. Here I had seen other storage systems with a comparable configuration which were able to deliver up to 12k IOPS@1MByte.

Remember the questions from the first blog post:
-How many devices should I bundle into a diskgroup for best performance?
As many as possible.

-How many backups/duplicates can I run in parallel to my normal database workload without interfering with it?
You can run 5 parallel backup/duplicates with 1000 IOPS each without interferring a base line of 40-50k IOPS@8KByte

-What is the best rebalance power I can use on my system?
2-4 is absolutley enough for this system. More will slow down your other operations on the server.

Cet article Huawei Dorado 6000 V3 benchmark est apparu en premier sur Blog dbi services.

Storage performance benchmarking with FIO

Wed, 2019-07-10 02:18

Learn how to do storage performance benchmarks for your database system with the open source tool FIO.

Before you read

This is a series of different blog posts:
In the first blog post, I talk about “What you should measure on your database storage and why”.
The second blog post will talk about “How to do database storage performance benchmark with FIO” (aka this one here).
The third blog post will show “How good is the new HUAWEI Dorada 6000V3 All-Flash System for databases” measured with the methods and tools from post one and two.

The first two posts give you the theory to understand all the graphics and numbers I will show in the third blog post.

Install FIO

Many distributions have FIO in their repositories. On a Fedora/RHEL system, you can just use
yum install fio
and you are ready to go.

Start a benchmark with FIO

There are mainly two different ways to start a benchmark with FIO

Command line

Starting from the command line is the way to go when you just wanna have a quick feeling about the system performance.
I prefer to do more complex setups with job files. It is easier to create and debug.
Here a small example how to start a benchmark direct from the command line:
fio --filename=/dev/xvdf --direct=1 --rw=randwrite --refill_buffers --norandommap \
--randrepeat=0 --ioengine=libaio --bs=128k --rate_iops=1280 --iodepth=16 --numjobs=1 \
--time_based --runtime=86400 --group_reporting –-name=benchtest

FIO Job files

An FIO job file holds a [GLOBAL] section and one or many [JOBS] sections. This section holds the shared parameters which are used for all the jobs when you do not override them in the job sections.
Here is what a typical GLOBAL section from my files looks like:
[global] ioengine=libaio    #ASYNCH IO
invalidate=1       #Invalidate buffer-cache for the file prior to starting I/O.
                   #Should not be necessary because of direct IO but just to be sure ;-)
ramp_time=5        #First 5 seconds do not count to the result.
iodepth=1          #Number of I/O units to keep in flight against the file
runtime=60         #Runtime for every test
time_based         #If given, run for the specified runtime duration even if the files are completely read or written.
                   #The same workload will be repeated as many times as runtimeallows.
direct=1           #Use non buffered I/O.
group_reporting=1  #If set, display per-group reports instead of per-job when numjobs is specified.
per_job_logs=0     #If set, this generates bw/clat/iops log with per file private filenames.
                   #If not set, jobs with identical names will share the log filename.
bs=8k              #Block size
rw=randread        #I/O Type

Now that we have defined the basics, we can start with the JOBS section:
Example of single device test with different parallelity:


#
#Subtest: 1
#Total devices = 1
#Parallelity = 1
#Number of processes = devices*parallelity ==> 1*1 ==> 1
#
[test1-subtest1-blocksize8k-threads1-device1of1]     #Parallelity 1, Number of device: 1/1
stonewall                               #run this test until the next [JOB SECTION] with the “stonewall” keyword
filename=/dev/mapper/device01           #Device to use
numjobs=1                               #Create the specified number of clones of this job.
                                        #Each clone of job is spawned as an independent thread or process.
                                        #May be used to setup a larger number of threads/processes doing the same thing.
                                        #Each thread is reported separately: to see statistics for all clones as a whole
                                        #use group_reporting in conjunction with new_group.
#
#Subtest: 5
#Total devices = 1
#Parallelity = 5
#Number of processes = devices*parallelity ==> 1*5 ==> 5
#
[test1-subtest5-blocksize8k-threads5-device1of1]     #Parallelity 5, Number of device: 1/1
stonewall
numjobs=5
filename=/dev/mapper/device01

Example of multi device test with different parallelity:

#Subtest: 1
#Total devices = 4
#Parallelity = 1
#Number of processes = devices*parallelity ==> 4
#
[test1-subtest1-blocksize8k-threads1-device1of4]     # Parallelity 1, Number of device 1/4
stonewall
numjobs=1
filename=/dev/mapper/device01
[test1-subtest1-blocksize8k-threads1-device2of4]     # Parallelity 1, Number of device 2/4
numjobs=1
filename=/dev/mapper/device02
[test1-subtest1-blocksize8k-threads1-device3of4]     # Parallelity 1, Number of device 3/4
numjobs=1
filename=/dev/mapper/device03
[test1-subtest1-blocksize8k-threads1-device4of4]     # Parallelity 1, Number of device 4/4
numjobs=1
filename=/dev/mapper/device04
#
#Subtest: 5
#Total devices = 3
#Parallelity = 5
#Number of processes = devices*parallelity ==> 5
#
[test1-subtest5-blocksize8k-threads5-device1of3]     # Parallelity 5, Number of device 1/3
stonewall
numjobs=5
filename=/dev/mapper/device01
[test1-subtest5-blocksize8k-threads5-device2of3]     # Parallelity 5, Number of device 2/3
filename=/dev/mapper/device02
[test1-subtest5-blocksize8k-threads5-device3of3]     # Parallelity 5, Number of device 3/3
filename=/dev/mapper/device03

You can download a compelete set of FIO job files for running the described testcase on my github repository.
Job files list

To run a complete test with my job files you have to replace the devices. There is a small shell script to replace the devices called “replaceDevices.sh”

#!/bin/bash
######################################################
# dbi services michael.wirz@dbi-services.com
# Vesion: 1.0
#
# usage: ./replaceDevices.sh
#
# todo before use: modify newname01-newname10 with
# the name of your devices
######################################################
sed -i -e 's_/dev/mapper/device01_/dev/mapper/newname01_g' *.fio
sed -i -e 's_/dev/mapper/device02_/dev/mapper/newname02_g' *.fio
sed -i -e 's_/dev/mapper/device03_/dev/mapper/newname03_g' *.fio
sed -i -e 's_/dev/mapper/device04_/dev/mapper/newname04_g' *.fio
sed -i -e 's_/dev/mapper/device05_/dev/mapper/newname05_g' *.fio
sed -i -e 's_/dev/mapper/device06_/dev/mapper/newname06_g' *.fio
sed -i -e 's_/dev/mapper/device07_/dev/mapper/newname07_g' *.fio
sed -i -e 's_/dev/mapper/device08_/dev/mapper/newname08_g' *.fio
sed -i -e 's_/dev/mapper/device09_/dev/mapper/newname09_g' *.fio
sed -i -e 's_/dev/mapper/device10_/dev/mapper/newname10_g' *.fio

!!!After you replaced the filenames you should double check, that you have the correct devices, because when you start the test, all data on these devices is lost!!!

grep filename *.fio|awk -F '=' '{print $2}'|sort -u
/dev/mapper/device01
/dev/mapper/device02
/dev/mapper/device03
/dev/mapper/device04
/dev/mapper/device05
/dev/mapper/device06
/dev/mapper/device07
/dev/mapper/device08
/dev/mapper/device09
/dev/mapper/device10

To start the test run:

for job_file in $(ls *.fio)
do
    fio ${job_file} --output /tmp/bench/${job_file%.fio}.txt
done

Multiple Servers

FIO supports to do tests on multiple servers in parallel which is very nice! Often a single server can not max out a modern all-flash storage system, this could be of bandwidth problems (e.g. not enough adapters per server) or one server is just not powerful enough.

You need to start FIO in server mode on all machines you wanna test:
fio --server

Then you start the test with
fio --client=serverA,serverB,serverC /path/to/fio_jobs.file

Should you have a lot of servers you can put them in a file and use this as input for your fio command:


cat fio_hosts.list
serverA
serverB
serverC
serverD
...

fio --client=fio_hosts.list /path/to/fio_jobs.file

Results

The output files are not really human readable, so you can go with my getResults.sh script which formats you the output ready to copy/past to excel:


cd /home/user/Huawei-Dorado6000V3-Benchmark/TESTRUN5-HOST1_3-COMPR/fio-benchmark-output
bash ../../getResults.sh
###########################################
START :Typerandread-BS8k
FUNCTION: getResults
###########################################
Typerandread-BS8k
LATENCY IN MS
.399 .824 1.664 2.500 3.332 5.022 6.660 8.316 12.464 16.683
.392 .826 1.667 2.495 3.331 4.995 6.680 8.344 12.474 16.637
.397 .828 1.661 2.499 3.330 4.992 6.656 8.329 12.505 16.656
.391 .827 1.663 2.493 3.329 5.002 6.653 8.330 12.482 16.656
.398 .827 1.663 2.497 3.327 5.005 6.660 8.327 12.480 16.683
.403 .828 1.662 2.495 3.326 4.995 6.663 8.330 12.503 16.688
.405 .825 1.662 2.496 3.325 4.997 6.648 8.284 12.369 16.444
.417 .825 1.661 2.497 3.326 4.996 6.640 8.256 12.303 16.441
.401 .826 1.661 2.500 3.327 4.999 6.623 8.273 12.300 16.438
.404 .826 1.661 2.500 3.327 4.993 6.637 8.261 12.383 16.495
IOPS
2469 6009 5989 5986 5991 5966 5998 6006 6012 5989
5004 12000 11000 11000 11000 11000 11000 11000 12000 12000
7407 17000 18000 17000 17000 18000 18000 17000 17000 17000
10000 23000 23000 24000 23000 23000 24000 23000 24000 23000
12300 29000 29000 29000 30000 29900 29000 29000 30000 29900
14600 35900 35000 35000 36000 35000 35000 35000 35000 35900
16000 42100 41000 41000 42000 41000 42100 42200 42400 42500
16500 42100 41000 41900 42000 41000 42100 42400 42600 42500
19600 48000 47000 47900 47000 47900 48300 48300 48700 48600
21900 54000 53000 53900 53000 53000 54200 54400 54400 54400
###########################################
START :Typerandwrite-BS8k
FUNCTION: getResults
###########################################
Typerandwrite-BS8k
LATENCY IN MS
.461 .826 1.662 2.501 3.332 5.022 6.660 8.317 12.467 16.676
.457 .826 1.668 2.495 3.330 5.002 6.681 8.346 12.473 16.635
.449 .826 1.662 2.499 3.327 4.991 6.664 8.326 12.497 16.649
.456 .828 1.661 2.496 3.331 4.997 6.663 8.329 12.477 16.651
.460 .827 1.663 2.495 3.327 5.001 6.660 8.333 12.484 16.676
.463 .830 1.663 2.495 3.325 4.997 6.661 8.330 12.503 16.684
.474 .827 1.661 2.495 3.324 4.999 6.665 8.334 12.451 16.580
.469 .828 1.661 2.497 3.324 5.002 6.668 8.322 12.489 16.594
.471 .827 1.660 2.499 3.327 4.998 6.663 8.335 12.481 16.609
.476 .825 1.675 2.500 3.328 4.992 6.675 8.334 12.480 16.623
IOPS
2137 5997 5990 5985 5991 5966 5998 6005 6010 5992
4306 12000 11900 11000 11000 11000 11000 11000 12000 12000
6571 17000 17000 17000 18000 18000 17000 17000 17000 18000
8635 23900 23000 23000 23000 23000 23000 23000 24000 24000
10700 29000 29000 29000 30000 29900 29000 29000 30000 29000
12800 35900 35000 35000 36000 35000 35000 35000 35000 35900
14500 41000 41000 41000 42000 41000 41000 41000 42100 42200
14700 41000 41000 41900 42000 41900 41900 42000 42000 42100
16700 48000 48000 47900 47000 47000 47000 47900 47000 48100
18600 54100 53500 53900 53000 54000 53900 53900 53000 54100
...

Copy & paste the result into the excel template and you can have an easy over view of the results:
fio summary excel

Troubleshooting

If you’ve got a libaio error you have to install the libaio libraries:

fio: engine libaio not loadable
fio: failed to load engine
fio: file:ioengines.c:89, func=dlopen, error=libaio: cannot open shared object file: No such file or directory

yum install libaio-devel

Cet article Storage performance benchmarking with FIO est apparu en premier sur Blog dbi services.

Converting columns from one data type to another in PostgreSQL

Mon, 2019-07-08 00:19

Usually you should use the data type that best fits the representation of your data in a relational database. But how many times did you see applications that store dates or numbers as text or dates as integers? This is not so uncommon as you might think and fixing that could be quite a challenge as you need to cast from one data type to another when you want to change the data type used for a specific column. Depending on the current format of the data it might be easy to fix or it might become more complicated. PostgreSQL has a quite clever way of doing that.

Frequent readers of our blog might know that already: We start with a simple, reproducible test setup:

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values ( 1, '20190101');
INSERT 0 1
postgres=# insert into t1 values ( 2, '20190102');
INSERT 0 1
postgres=# insert into t1 values ( 3, '20190103');
INSERT 0 1
postgres=# select * from t1;
 a |    b     
---+----------
 1 | 20190101
 2 | 20190102
 3 | 20190103
(3 rows)

What do we have here? A simple table with two columns: Column “a” is an integer and column “b” is of type text. For humans it seems obvious that the second column in reality contains a date but stored as text. What options do we have to fix that? We could try something like this:

postgres=# alter table t1 add column c date default (to_date('YYYYDDMM',b));
psql: ERROR:  cannot use column reference in DEFAULT expression

That obviously does not work. Another option would be to add another column with the correct data type, populate that column and then drop the original one:

postgres=# alter table t1 add column c date;
ALTER TABLE
postgres=# update t1 set c = to_date('YYYYMMDD',b);
UPDATE 3
postgres=# alter table t1 drop column b;
ALTER TABLE

But what is the downside of that? This will probably break the application as the column name changed and there is no way to avoid that. Is there a better way of doing that? Let’s start from scratch:

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values ( 1, '20190101');
INSERT 0 1
postgres=# insert into t1 values ( 2, '20190102');
INSERT 0 1
postgres=# insert into t1 values ( 3, '20190103');
INSERT 0 1
postgres=# select * from t1;
 a |    b     
---+----------
 1 | 20190101
 2 | 20190102
 3 | 20190103
(3 rows)

The same setup as before. What other options do we have to convert "b" to a real date without changing the name of the column. Let's try the most obvious way and let PostgreSQL decide what to do:

postgres=# alter table t1 alter column b type date;
psql: ERROR:  column "b" cannot be cast automatically to type date
HINT:  You might need to specify "USING b::date".

This does not work as PostgreSQL in this case can not know how to go from one data type to another. But the “HINT” does already tell us what we might need to do:

postgres=# alter table t1 alter column b type date using (b::date);
ALTER TABLE
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | date    |           |          | 

postgres=# 

For our data in the “b” column that does work. but consider you have data like this:

postgres=# drop table t1;
DROP TABLE
postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values (1,'01-JAN-2019');
INSERT 0 1
postgres=# insert into t1 values (2,'02-JAN-2019');
INSERT 0 1
postgres=# insert into t1 values (3,'03-JAN-2019');
INSERT 0 1
postgres=# select * from t1;
 a |      b      
---+-------------
 1 | 01-JAN-2019
 2 | 02-JAN-2019
 3 | 03-JAN-2019
(3 rows)

Would that still work?

postgres=# alter table t1 alter column b type date using (b::date);;
ALTER TABLE
postgres=# select * from t1;
 a |     b      
---+------------
 1 | 2019-01-01
 2 | 2019-01-02
 3 | 2019-01-03
(3 rows)

Yes, but in this case it will not:

DROP TABLE
postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values (1,'First--January--19');
INSERT 0 1
postgres=# insert into t1 values (2,'Second--January--19');
INSERT 0 1
postgres=# insert into t1 values (3,'Third--January--19');
INSERT 0 1
postgres=# select * from t1;
 a |          b           
---+---------------------
 1 | First--January--19
 2 | Second--January--19
 3 | Third--January--19
(3 rows)

postgres=# alter table t1 alter column b type date using (b::date);;
psql: ERROR:  invalid input syntax for type date: "First--January--19"
postgres=# 

As PostgreSQL has no idea how to do the conversion this will fail, no surprise here. But still you have the power of doing that by providing a function that does the conversion in exactly the way you want to have it:

create or replace function f_convert_to_date ( pv_text in text ) returns date
as $$
declare
begin
  return date('20190101');
end;
$$ language plpgsql;

Of course you would add logic to parse the input string so that the function will return the matching date and not a constant as in this example. For demonstration purposes we will go with this fake function:

postgres=# alter table t1 alter column b type date using (f_convert_to_date(b));;
ALTER TABLE
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | date    |           |          | 

postgres=# select * from t1;
 a |     b      
---+------------
 1 | 2019-01-01
 2 | 2019-01-01
 3 | 2019-01-01
(3 rows)

… and here we go. The column was converted from text to date and we provided the exact way of doing that by calling a function that contains the logic to do that. As long as the output of the function conforms to the data type you want and you did not do any mistakes you can potentially go from any source data type to any target data type.

There is one remaining question: Will that block other sessions selecting from the table while the conversion is ongoing?

postgres=# drop table t1;
DROP TABLE
postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 select a, '20190101' from generate_series(1,1000000) a;
INSERT 0 1000000
postgres=# create index i1 on t1(a);
CREATE INDEX

In one session we will do the conversion and in the other session we will do a simple select that goes over the index:

-- first session
postgres=# alter table t1 alter column b type date using (f_convert_to_date(b));

Second one at the same time:

-- second session
postgres=# select * from t1 where a = 1;
-- blocks

Yes, that will block, so you should plan such actions carefully when you have a busy system. But this is still better than adding a new column.

Cet article Converting columns from one data type to another in PostgreSQL est apparu en premier sur Blog dbi services.

Telling the PostgreSQL optimizer more about your functions

Sun, 2019-07-07 05:29

When you reference/call functions in PostgreSQL the optimizer does not really know much about the cost nor the amount of rows that a function returns. This is not really surprising as it is hard to predict what the functions is doing and how many rows will be returned for a given set of input parameters. What you might not know is, that indeed you can tell the optimizer a bit more about your functions.

As usual let’s start with a little test setup:

postgres=# create table t1 ( a int, b text, c date );
CREATE TABLE
postgres=# insert into t1 select a,a::text,now() from generate_series(1,1000000) a;
INSERT 0 1000000
postgres=# create unique index i1 on t1(a);
CREATE INDEX
postgres=# analyze t1;
ANALYZE

A simple table containing 1’000’000 rows and one unique index. In addition let’s create a simple function that will return exactly one row from that table:

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql;

What is the optimizer doing when you call that function?

postgres=# explain (analyze) select f_tmp (1);
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..5.27 rows=1000 width=32) (actual time=0.654..0.657 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.004 rows=1 loops=1)
 Planning Time: 0.047 ms
 Execution Time: 0.696 ms
(4 rows)

We know that only one row will be returned but the optimizer is assuming that 1000 rows will be returned. This is the default and documented. So, no matter how many rows will really be returned, PostgreSQL will always estimate 1000. But you have some control and can tell the optimizer that the function will return one row only:

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql
   rows 1;

Looking again at the execution plan again:

postgres=# explain (analyze) select f_tmp (1);
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..0.27 rows=1 width=32) (actual time=0.451..0.454 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.004 rows=1 loops=1)
 Planning Time: 0.068 ms
 Execution Time: 0.503 ms
(4 rows)

Instead of 1000 rows we now do see that only 1 row was estimated which is what we specified when we created the function. Of course this is a very simple example and in reality you often might not be able to tell exactly how many rows will be returned from a function. But at least you can provide a better estimate as the default of 1000. In addition you can also specify a cost for your function (based on cpu_operator_cost):

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql
   rows 1
   cost 1;

If you use functions remember that you can give the optimizer more information and that there is a default of 1000.

Cet article Telling the PostgreSQL optimizer more about your functions est apparu en premier sur Blog dbi services.

SQL Server containers and docker network driver performance considerations

Fri, 2019-07-05 01:45

Few months ago I attended to the Franck Pachot session about Microservices and databases at SOUG Romandie in Lausanne on 2019 May 21th. He covered some performance challenges that can be introduced by Microservices architecture design and especially when database components come into the game with chatty applications. One year ago, I was in a situation where a customer installed some SQL Server Linux 2017 containers in a Docker infrastructure with user applications located outside of this infrastructure. It is likely an uncommon way to start with containers but anyway when you are immerging in a Docker world you just notice there is a lot of network drivers and considerations you may be aware of and just for a sake of curiosity, I proposed to my customer to perform some network benchmark tests to get a clear picture of these network drivers and their related overhead in order to design correctly Docker infrastructure from a performance standpoint.

The initial customer’s scenario included a standalone Docker infrastructure and we considered different approaches about application network configurations from a performance perspective. We did the same for the second scenario that concerned a Docker Swarm infrastructure we installed in a second step.

The Initial reference – Host network and Docker host network

The first point was to get an initial reference with no network management overhead directly from the network host. We used the iperf3 tool for the tests. This is a kind of tool I’m using with virtual environments as well to ensure network throughput is what we really expect and sometimes I got some surprises on this topic. So, let’s go back to the container world and each test was performed from a Linux host outside to the concerned Docker infrastructure according to the customer scenario.

The attached network card speed link of the Docker Host is supposed to be 10GBits/sec …

$ sudo ethtool eth0 | grep "Speed"
        Speed: 10000Mb/s

 

… and it is confirmed by the first iperf3 output below:

Let’s say that we tested the Docker host driver as well and we got similar results.

$ docker run  -it --rm --name=iperf3-server  --net=host networkstatic/iperf3 -s

 

Docker bridge mode

The default modus operandi for a Docker host is to create a virtual ethernet bridge (called docker0), attach each container’s network interface to the bridge, and to use network address translation (NAT) when containers need to make themselves visible to the Docker host and beyond. Unless specified, a docker container will use it by default and this is exactly the network driver used by containers in the context of my customer. In fact, we used user-defined bridge network but I would say it doesn’t matter for the tests we performed here.

$ ip addr show docker0
5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:70:0a:e8:7a brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:70ff:fe0a:e87a/64 scope link
       valid_lft forever preferred_lft forever

 

The iperf3 docker container I ran for my tests is using the default bridge network as show below. The interface with index 24 corresponds to the veth0bfc2dc peer of the concerned container.

$ docker run  -d --name=iperf3-server -p 5204:5201 networkstatic/iperf3 -s
…
$ docker ps | grep iperf
5c739940e703        networkstatic/iperf3              "iperf3 -s"              38 minutes ago      Up 38 minutes                0.0.0.0:5204->5201/tcp   iperf3-server
$ docker exec -ti 5c7 ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
24: eth0@if25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

[clustadmin@docker1 ~]$ ethtool -S veth0bfc2dc
NIC statistics:
     peer_ifindex: 24

 

Here the output after running the iperf3 benchmark:

It’s worth noting that the “Bridge” network adds some overheads with an impact of 13% in my tests but in fact, it is an expected outcome to be honest and especially if we refer to the Docker documentation:

Compared to the default bridge mode, the host mode gives significantly better networking performance since it uses the host’s native networking stack whereas the bridge has to go through one level of virtualization through the docker daemon.

 

When the docker-proxy comes into play

Next scenario we wanted to test concerned the closet network proximity we may have between the user applications and the SQL Server containers in the Docker infrastructure. In other words, we assumed the application resides on the same host than the SQL Server container and we got some surprises from the docker-proxy itself.

Before running the iperf3 result, I think we have to answer to the million-dollar question here: what is the docker-proxy? But did you only pay attention to this process on your docker host? Let’s run a pstree command:

$ pstree
systemd─┬─NetworkManager───2*[{NetworkManager}]
        ├─agetty
        ├─auditd───{auditd}
        ├─containerd─┬─containerd-shim─┬─npm─┬─node───9*[{node}]
        │            │                 │     └─9*[{npm}]
        │            │                 └─12*[{containerd-shim}]
        │            ├─containerd-shim─┬─registry───9*[{registry}]
        │            │                 └─10*[{containerd-shim}]
        │            ├─containerd-shim─┬─iperf3
        │            │                 └─9*[{containerd-shim}]
        │            └─16*[{containerd}]
        ├─crond
        ├─dbus-daemon
        ├─dockerd─┬─docker-proxy───7*[{docker-proxy}]
        │         └─20*[{dockerd}]

 

Well, if I understand correctly the Docker documentation, the purpose of this process is to enable a service consumer to communicate with the service providing container …. but it’s only used in particular circumstances. Just bear in mind that controlling access to a container’s service is massively done through the host netfilter framework, in both NAT and filter tables and the docker-proxy mechanism is required only when this method of control is not available:

  • When the Docker daemon is started with –iptables=false or –ip-forward=false or when the Linux host cannot act as a router with Linux kernel parameter ipv4.ip_forward=0. This is not my case here.
  • When you are using localhost in the connection string of your application that implies to use the loopback interface (127.0.0.0/8) and the Kernel doesn’t allow routing traffic from it. Therefore, it’s not possible to apply netfilter NAT rules and instead, netfilter sends packets through the filter table’s INPUT chain to a local process listening on the docker-proxy
$ sudo iptables -L -n -t nat | grep 127.0.0.0
DOCKER     all  --  0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

 

In the picture below you will notice I’m using the localhost key word in my connection string so the docker-proxy comes into play.

A huge performance impact for sure which is about 28%. This performance drop may be explained by the fact the docker-proxy process is consuming 100% of my CPUs:

The docker-proxy operates in userland and I may simply disable it with the docker daemon parameter – “userland-proxy”: false – but I would say this is a case we would not encounter in practice because applications will never use localhost in their connection strings. By the way, changing the connection string from localhost to the IP address of the host container gives a very different outcome similar to the Docker bridge network scenario.

 

Using an overlay network

Using a single docker host doesn’t fit well with HA or scalability requirements and in a mission-critical environment I strongly guess no customer will go this way. I recommended to my customer to consider using an orchestrator like Docker Swarm or K8s to anticipate future container workload that was coming from future projects. The customer picked up Docker Swarm for its easier implementation compared to K8s.

 

After implementing a proof of concept for testing purposes (3 nodes included one manager and two worker nodes), we took the opportunity to measure the potential overhead implied by the overlay network which is the common driver used by containers through stacks and services in such situation. Referring to the Docker documentation overlay networks manage communications among the Docker daemons participating in the swarm and used by services deployed on it. Here the docker nodes in the swarm infrastructure:

$ docker node ls
ID                            HOSTNAME                    STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
vvdofx0fjzcj8elueoxoh2irj *   docker1.dbi-services.test   Ready               Active              Leader              18.09.5
njq5x23dw2ubwylkc7n6x63ly     docker2.dbi-services.test   Ready               Active                                  18.09.5
ruxyptq1b8mdpqgf0zha8zqjl     docker3.dbi-services.test   Ready               Active                                  18.09.5

 

An ingress overlay network is created by default when setting up a swarm cluster. User-defined overlay network may be created afterwards and extends to the other nodes only when needed by containers.

$ docker network ls | grep overlay
NETWORK ID    NAME              DRIVER   SCOPE
ehw16ycy980s  ingress           overlay  swarm

 

Here the result of the iperf3 benchmark:

Well, the same result than the previous test with roughly 30% of performance drop. Compared to the initial reference, this is again an expected outcome but I didn’t imagine how important could be the impact in such case.  Overlay network introduces additional overhead by putting together behind the scene a VXLAN tunnel (virtual Layer 2 network on top of an existing Layer 3 infrastructure), VTEP endpoints for encapsulation/de-encapsulation stuff and traffic encryption by default.

Here a summary of the different scenarios and their performance impact:

Scenario Throughput (GB/s) Performance impact Host network 10.3 Docker host network 10.3 Docker bridge network 8.93 0.78 Docker proxy 7.37 0.71 Docker overlay network 7.04 0.68

 

In the particular case of my customer where SQL Server instances sit on the Docker infrastructure and applications reside outside of it, it’s clear that using directly Docker host network may be a good option from a performance standpoint assuming this infrastructure remains simple with few SQL Server containers. But in this case, we have to change the SQL Server default listen port with MSSQL_TCP_PORT parameter because using Docker host networking doesn’t provide port mapping capabilities. According to our tests, we didn’t get any evidence of performance improvement in terms of application response time between Docker network drivers but probably because those applications are not network bound here. But I may imagine scenarios where it can be. Finally, this kind of scenario encountered here is likely uncommon and I see containerized apps with database components outside the Docker infrastructure more often but it doesn’t change the game at all and the same considerations apply here … Today I’m very curious to test real microservices scenarios where database and application components are all sitting on a Docker infrastructure.

See you!

 

Cet article SQL Server containers and docker network driver performance considerations est apparu en premier sur Blog dbi services.

Pages