Monday, 27 September 2021

High Availability and DR for SAP HANA, SAP S/4HANA, and SAP Central Services

As SAP application touches many critical parts of a company such as its’ ERP, manufacturing, business processes, customer service etc. it has become the lifeline of many enterprises that depends on it for their business to operate properly. As such, high-availability has became one of the top concerns of company managements when it comes to their SAP systems.

In this article, we will discuss at a high-level what is HANA system replication, how it works, what are the limitations when it comes to high-availability, and how we can overcome them. We will also discuss about the options for SAP HANA, SAP S/4HANA and SAP Central Services’s high-availability and what are the key differences, so that you can choose the right tool for the right job.

Some of the key questions you may need to ask yourselves at the end of the day, in order to select the right solution to use for HA:

  • Meet Recovery Time Objectives (RTO)

—– How long can SAP be down before you recover?

  • Meet Recovery Point Objectives (RPO)

—–  How old can your data be when service is restored

  • Meet Availability Service Level Agreements (SLA)

—– How much uptime do you need?

SAP HANA system replication

With SAP HANA System Replication you can have continuous synchronization of a SAP HANA database to a secondary location either in the same data center, remote site or in the clou, which creates a reliable data protection and disaster recovery solution.

System Replication is a standard SAP HANA feature that comes with the software. Using this feature, all data is replicated to the secondary site and data is pre-loaded into memory on the secondary site which helps to reduce the recovery time objective (RTO) significantly. So in case of a failover, the secondary site will be able to take over without even performing a HANA DB (re)start and will work as primary DB instantaneously upon failover. However, the failover has to be triggered manually by the admin using the sr_takeover command, and for the replication to be reversed, or failback to primary, separate commands will need to be issued as well.

HANA System Replication failover high-availability and DR
Figure 1: SAP HANA System Replication failover high-availability and DR

Below are some key points of the SAP HANA system replication method for HA and DR:

  • Redundant Servers / Nodes
  • In-memory database replicated by HANA system replication (in “log replay” mode)
  • Multiple replication options: sync, sync-mem, async
  • Supports active-active (read-only on secondary)
  • Setup and admin through HANA cockpit, HANA studio or command line

Limitations

  • No monitoring of application process or replication failures and automated failover
  • Failover, reverse replication and failback has to be performed manually – many manual steps are needed
  • No virtual IP
  • No integrated HA failover orchestration together with SAP Central Services etc. components

As you can probably deduce from the above points by now, SAP HANA system replication is designed to protect against data loss. Such that when an issue happens with the primary node, an admin can manually run a “sr_takeover” command, so that a problem with the primary system will not take down the entire SAP setup which depends on the HANA database for the prolonged period of downtime. However, a lot of this work has to happen manually and depends on human manual intervention, which although is good enough for DR, it does not make an ideal situation for HA (where downtime needs to be prevented).

There are additional options to improve the high-availability of SAP HANA, SAP S/4HANA and other components, that you may choose from, to protect their services and data.

Open-source HA for SAP

Linux vendors like SUSE and Redhat has high-availability extensions that comes with their “Enterprise for SAP” subscriptions. They would bundle in Pacemaker, Corosync, DRBD opensource software that would allow you to build high-availability clusters for HANA database, ASCS, ERS and other SAP components.

Since there are already many online resources covering how to implement Pacemaker for HANA which you can refer to, I will not post the steps on how to use them. For instance this blog by fellow community member Tomas Krojzl is a good starting point for your reference – https://blogs.sap.com/2017/11/19/be-prepared-for-using-pacemaker-cluster-for-sap-hana-part-1-basics/

SIOS High Availability Clustering

SIOS high availability software for SAP can be used to protect SAP S/4HANA, SAP HANA and SAP Central Services in any configuration (or combination) of virtual, physical, cloud (public, private, and hybrid). It can provide easy and flexible configuration, high-performance replication, and comprehensive monitoring and protection of the entire SAP environment.

Specifically for SAP S/4HANA and the SAP HANA databases, SIOS can be used to complement what SAP is already doing with the HANA system replication. Adding onto HANA system replication, to provide complete automated high-availability – automated monitoring of key SAP HANA application processes, and provide automated failover, failback, including virtual IP(s), even if you have multi-instance setup.

It can also integrate the entire stack such that in a DR scenario, for instance, the entire primary datacenter did not fail, but only the SAP HANA database failed, you can orchestrate the entire stack including application servers, central services to failover to the other node to preserve performance.

SIOS HANA System Replication failover high-availability and DR
Figure 2: SIOS HANA System Replication failover high-availability and DR

Below are some key points of the SIOS Protection Suite for SAP HANA HA and DR:

  • Works in the cloud cross AZ and AR
  • Provides automated failure detection and failover for key SAP HANA DB components:
    — SAP HANA Host Agent
    — SAP HANA sapstartsrv
    — SAP HANA Replication
  • Enables automated SAP HANA replication takeover, switchback
  • Automatically reverse replication
  • Verifies and monitors that the HANA DB is running
  • Provides Virtual IP
  • “Full stack” failover orchestration with ASCS etc. SAP components

Four steps to install and configure HA for HANA database

We will not discuss the specific steps of how to configure SAP HANA, since there are already many on-line resources that cover those steps. But at a high-level, what you need to do are 4 basic steps:

  1. Install SAP HANA
  2. Configure HANA system Replication
    See – https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.02/en-US/676844172c2442f0bf6c8b080db05ae7.html
  3. Install SIOS protection suite
    See – http://docs.us.sios.com/spslinux/9.4.1/en/topic/sios-protection-suite-for-linux-installation-guide
  4. Use HANA recovery kit (wizard) in GUI to protect HANA
    See – http://docs.us.sios.com/spslinux/9.4.1/en/topic/sap-hana-recovery-kit

The installation process flow are similar for other SAP components (ASCS, ERS, PAS, Web Dispatcher etc.) as well.

With the HANA recovery kit included in the SIOS protection suite software, you can basically use a wizard in the SIOS Lifekeeper management GUI, to quickly protect a HANA database instance, assign the virtual IP address for clients to connect to it, and manage the entire stack from it. You can have multi-instance environment and the solution will manage all the instances, virtual IPs etc. within the a fully integrated GUI, which makes it very easy to configure, manage the entire SAP landscape that is on SIOS HA.

SIOS Lifekeeper Management GUI for SAP HANA ASCS and ERS
Figure 3: SIOS Lifekeeper Management GUI for SAP HANA  SAP Central Service

 

Additional HA/DR stack that SIOS can protect for SAP –

Other than HANA database, SIOS Protection Suite also provides protection for key SAP services and supporting applications, all of which can be managed from the same GUI :

  • Primary Application Server (PAS)
  • ABAP SAP Central Service (ASCS)
  • SAP Central Services (SCS)
  • Enqueue and message servers
  • Enqueue Replication Server (ERS)
  • Database (Oracle, Sybase, MaxDB, HANA, etc)
  • Shared and/or Replicated File Systems
  • Logical Volumes (LVM)
  • NFS Mounts and Exports
  • Virtual IPs

The deployment steps are similar to SAP HANA. The key differences however is how you would protect the shared filesystem which can be different depending in whether you are using NFS that can be replicated and protected by SIOS, or cloud provider’s fileshare service, bearing in mind that is has to be replicated and redundant across the AZs or region as well.

Clustering in the cloud

When moving SAP to the cloud, one of the key challenges is how to protect the SAP database, as well as the SAP applications stack in a SAP supported architecture. SIOS has been forefront of this move and are designed, certified and supported by SAP as well as all the major cloud providers.

The diagram below is a high-level design of how a pair of S/4HANA system can be deployed across different availability zones, or even regions. In cloud environments, as the providers do have very low latencies between AZs, it is entirely possible to use synchronous replication across the AZs, thereby creating a pair of highly available S/4HANA system, not just for HA but also for DR at the same time. This is because AZs are geographically separate datacenters, much like how on-premise DR datacenters are, which highly redundant high-speed network connectivity between them.

SIOS Protection Suite for SAP S/4HANA cloud architecture
Figure 4: SIOS Protection Suite for SAP S/4HANA cloud architecture

 

Should you use a commercial HA solution like SIOS over open-source HA for SAP?

This question will invariably come up in people’s mind, since some Linux vendors already provide their HA extensions (HAE) or clustering, why would anyone want to use a commercial 3rd party HA solution like SIOS?

If you recall at the top of this article, I mentioned that the key questions to ask yourselves when selecting a HA/DR solution – what RPO, RTO and SLA you need to meet? You will want to use those basic points as basis when reviewing the points below. If the solution being selected fits the objectives you need to meet.

  1. Open-source HA is being offered as part of certain OS flavors “enterprise SAP” extensions subscription – it comes at a cost, it’s definitely not free, and not all Linux flavors are supported. SIOS supports all the major Linux flavors including Redhat, SUSE, Centos and Oracle Linux. For customers who want to run Windows for their ASCS or Content Server etc. SIOS also has Windows based solution with Windows clustering support, making it a one-stop-shop for the entire SAP landscape regardless of platform.
  2. Commercial HA support – OS vendors depend on open-source community for bug fixes, which can be a problem if the bug requires a longer time to get solved by a less active contributor. SIOS provides commercial support with dedicated support and development team just for its’ high-availability solution, and has immediate 24×7 support resolution, which would give customers much more confidence when there are issues that may develop.
  3. Complex setup and admin via command line is needed by open-source tools. They are made up of different components like Pacemaker, Corosync etc. maintained by different open-source initiatives. SIOS provides all-in-one GUI for wizards-based setup and admin. It allows one to deploy SAP HA in a matter of hours instead of weeks/months.
  4. SIOS provide pre-built application monitoring and failover orchestration for all SAP and cloud components requiring HA through a wizard in the GUI, as opposed to using HA extensions that still requires alot of manual configuration.
  5. Automatically ensures SAP ERS is always running in opposite node of ASCS – SIOS provides the intelligence even in a multi-node ASCS setup, if a failover occured and ASCS failsover to the node with the running ERS, when the original ASCS node recovers, ERS gets automatically switched across so that the locks are always getting the redundancy needed. Opensource solution requires this to be done manually, hence impacts reliability and availability especially in times of multiple failures and recovery.
  6. SIOS reduces implementation/management time and costs, the lesser time you spend implementing and maintaining HA, the more time you will have for other more important tasks.
  7. Open-source use its STONITH mechanism which had been hardly reliable especially in cloud environments, SIOS provides multi-throng approach to prevent false failover and split-brain – quorum witness, multiple comm. path (heartbeat) which has been proven for over 20 years to be highly reliable in many scenarios.

 

Summary

SAP HANA system replication feature comes as part of the software and works well to protect the database from dataloss in case a problem arise from hardware or system failures. However if high-availability is the requirement, it would still need a 3rd party solution in order to get some of the automated monitoring, failover orchestration, virtual IP and so on. While there are opensource options in the form of enterprise Linux OS subscriptions for SAP, they certainly do not come free, and technical support is still limited as they purely relying on opensource community to maintain the Pacemaker, Corosync etc. projects. and to get support from contributors. There are also limitations in the native System Replication, opensource HAE which can be overcome by a commercial software solution vendor like SIOS.

Hence, SIOS as a reliable 3rd party high-availability solution provider that is certified by SAP could be a viable option to help to ensure enterprise customers get the reliability and high-availability that they need in their mission critical SAP systems operations, for a peace of mind.

Parts of this blog post was from my blog posting on https://www.sios-apac.com/2020/04/high-availability-dr-for-s-4hana-and-other-sap-platforms

No comments:

Post a Comment

SAP HANA DB ANALYSIS AFTER ISSUES

To be able to further analyze your issue and environment please download the attached shell script you can get from KBA: 3218277 - Collectin...