COMMAND NAME: gprecoverseg

Recovers a primary or mirror segment instance that has 
been marked as down (if mirroring is enabled).

******************************************************
Synopsis
******************************************************

gprecoverseg [-p <new_recover_host>[,...]]
             |-i <recover_config_file> 
             [-d <coordinator_data_directory>]
             [-B <batch_size>] [-b <segment_batch_size>]
             [-F] [-a] [-q] [-s] [--no-progress] [-l <logfile_directory>]


gprecoverseg -r


gprecoverseg -o <output_recover_config_file> 
             [-p <new_recover_host>[,...]]

gprecoverseg -? 

gprecoverseg --version

******************************************************
DESCRIPTION
******************************************************

In a system with mirrors enabled, the gprecoverseg utility reactivates 
a failed segment instance and identifies the changed database files 
that require resynchronization. Once gprecoverseg completes this 
process, the system goes into resyncronizing mode until the recovered 
segment is brought up to date. The system is online and fully 
operational during resyncronization.

A segment instance can fail for several reasons, such as a host failure, 
network failure, or disk failure. When a segment instance fails, its 
status is marked as down in the Apache Cloudberry system catalog, 
and its mirror is activated in change tracking mode. In order to 
bring the failed segment instance back into operation again, you 
must first correct the problem that made it fail in the first place, 
and then recover the segment instance in Apache Cloudberry 
using gprecoverseg.

Segment recovery using gprecoverseg requires that you have an active 
mirror to recover from. For systems that do not have mirroring enabled, 
or in the event of a double fault (a primary and mirror pair both down 
at the same time) - do a system restart to bring the segments back 
online (gpstop -r).

By default, a failed segment is recovered in place, meaning that 
the system brings the segment back online on the same host and data 
directory location on which it was originally configured. In 
this case, use the following format for the recovery configuration file 
(using -i).

<failed_host_address>|<port>|<data_directory>

In some cases, this may not be possible (for example, if a host was 
physically damaged and cannot be recovered). In this situation, 
gprecoverseg allows you to recover failed segments to a completely 
new host (using -p), on an alternative data directory location on 
your remaining live segment hosts (using -s), or by supplying a 
recovery configuration file (using -i) in the following format. 
The word SPACE indicates the location of a required space. Do not 
add additional spaces.

<failed_host_address>|<port>|<data_directory>SPACE
[<recovery_host_address>|<port>|<data_directory>

See the -i option below for details and examples of a recovery
configuration file.

The gp_segment_configuration system catalog tables can help you
determine your current segment configuration so that you can plan
your mirror recovery configuration. For example, run the following
query:

=# SELECT dbid, content, address, port, datadir 
   FROM gp_segment_configuration
   ORDER BY dbid;

The new recovery segment host must be pre-installed with the Cloudberry 
Database software and configured exactly the same as the existing 
segment hosts. A spare data directory location must exist on all 
currently configured segment hosts and have enough disk space to 
accommodate the failed segments.

The recovery process marks the segment as up again in the Cloudberry 
Database system catalog, and then initiates the resyncronization 
process to bring the transactional state of the segment 
up-to-date with the latest changes. The system is online 
and available during resyncronization.

If you do not have mirroring enabled or if you have both a 
primary and its mirror down, you must take manual steps to 
recover the failed segment instances and then restart the 
system, for example:

gpstop -r

******************************************************
OPTIONS
******************************************************

-a

Do not prompt the user for confirmation.


-B batch_size

The number of hosts to work on in parallel. If not specified,
the utility will start working on up to 16 hosts in parallel.
Valid values: 1-64


-b segment_batch_size

The number of segments per host to work on in parallel. If not
specified, the utility will start recovering up to 64 segments
in parallel on each host it is working on.
Valid values: 1-128


-d coordinator_data_directory

Optional. The coordinator host data directory. If not specified, 
the value set for $COORDINATOR_DATA_DIRECTORY will be used.


-F

Optional. Perform a full copy of the active segment instance 
in order to recover the failed segment. The default is to 
only copy over the incremental changes that occurred while 
the segment was down.


-i <recover_config_file>

Specifies the name of a file with the details about failed segments to recover. 
Each line in the file is in the following format. The word SPACE indicates 
the location of a required space. Do not add additional spaces.

<failed_host_address>|<port>|<data_directory>SPACE
<recovery_host_address>|<port>|<data_directory>

Comments
Lines beginning with # are treated as comments and ignored.

Segments to Recover
Each line after the first specifies a segment to recover. This line can 
have one of two formats. In the event of in-place recovery, enter one 
group of colon delimited fields in the line:

failedAddress|failedPort|failedDataDirectory

For recovery to a new location, enter two groups of fields separated by 
a space in the line. The required space is indicated by SPACE. Do not 
add additional spaces.

failedAddress|failedPort|failedDataDirectorySPACEnewAddress|newPort|
newDataDirectory

Examples

In-place recovery of a single mirror

sdw1-1|50001|/data1/mirror/gpseg16

Recovery of a single mirror to a new host

sdw1-1|50001|/data1/mirror/gpseg16SPACE sdw4-1|50001|51001|/data1/recover1/gpseg16

Obtaining a Sample File

You can use the -o option to output a sample recovery configuration file 
to use as a starting point.

-l <logfile_directory>

The directory to write the log file. Defaults to ~/gpAdminLogs.

-o <output_recover_config_file>

Specifies a file name and location to output a sample 
recovery configuration file. The output file lists the 
currently invalid segments and their default recovery 
location in the format that is required by the -i option. 
Use together with the -p option to output a sample file 
for recovering to a different host. This file can be 
edited to supply alternate recovery locations if needed.


-p <new_recover_host>[,...]

Specifies a spare host outside of the currently configured 
Apache Cloudberry array on which to recover invalid segments. In 
the case of multiple failed segment hosts, you can specify a 
comma-separated list. The spare host must have the Apache Cloudberry 
software installed and configured, and have the same hardware and OS 
configuration as the current segment hosts (same OS version, locales, 
gpadmin user account, data directory locations created, ssh keys 
exchanged, number of network interfaces, network interface naming 
convention, and so on.).


-q

Run in quiet mode. Command output is not displayed on 
the screen, but is still written to the log file.


-r

After a segment recovery, segment instances may not be returned to the 
preferred role that they were given at system initialization time. This 
can leave the system in a potentially unbalanced state, as some segment 
hosts may have more active segments than is optimal for top system performance. 
This option rebalances primary and mirror segments by returning them to 
their preferred roles. All segments must be valid and synchronized before 
running gprecoverseg -r. If there are any in progress queries, they will 
be cancelled and rolled back.

-s

Show pg_rewind/pg_basebackup progress sequentially instead of inplace. Useful
when writing to a file, or if a tty does not support escape sequences. Defaults
to showing the progress inplace.


--no-progress

Do not show pg_rewind/pg_basebackup progress. Useful when writing to a
file. Defaults to showing the progress.


--hba-hostnames

Optional. use hostnames instead of CIDR in pg_hba.conf


-v

Sets logging output to verbose.

--version

Displays the version of this utility.

-?
Displays the online help.


******************************************************
EXAMPLES
******************************************************

Recover any failed segment instances in place:

 $ gprecoverseg


Rebalance your Apache Cloudberry system after a recovery by
resetting all segments to their preferred role. First check
that all segments are up and synchronized.

  $ gpstate -m

  $ gprecoverseg -r


Recover any failed segment instances to a newly configured 
spare segment host:

  $ gprecoverseg -i recover_config_file


Output the default recovery configuration file:

  $ gprecoverseg -o /home/gpadmin/recover_config_file


******************************************************
SEE ALSO
******************************************************

gpstart, gpstop, gpstate
