Life is all About issues and solution

Issue Occurred : The issue occurred in a new deployment , even being it a fresh deployment, this issue was occurring. Where as both the ADM were in the same VLAN and same network.

It shows - Database streaming channel is broken between HA nodes.

To solve this:

Please run this command on Secondary Box (Replace with original IP addresses in the command):

nohup sh /mps/scripts/pgsql/join_streaming_replication.sh SecondaryIP PrimaryIP nsroot > /var/mps/log/join_streaming_replication_console.log 2>&1 &

Monitor the output of this command in /var/mps/log:

‘tail -f join_streaming_replication_console.log'

3 . Wait for a few minutes / hour depending on DB Size and confirm if the HA channel is UP by running the command

‘ps -ax | grep -i wal’

4. You should see this line to confirm if the channel is UP

?? Ss 0:14.14 postgres: wal receiver process streaming

bash-3.2#

bash-3.2# nohup sh /mps/scripts/pgsql/join_streaming_replication.sh 172.16.2.40 172.16.2.39 nsroot > /var/mps/log/join_streaming_replication_console.log 2>&1 &

[1] 4952

bash-3.2#

bash-3.2# cd /var/mps/log

bash-3.2# tail -f join_streaming_replication_console.log

Stopping appd

Stopping nsulfd

Reinitializing monit daemon

monit daemon with pid [629] killed

Stopped nsulfd

Stopped appd

waiting for server to shut down.... done

server stopped

cleaning up postgres data...

physical replication

---------------------

backup data from master node...this might take some time..

sleeping 20 seconds...

pg_basebackup: Done

set kernel params for tcp keepalive parameters

net.inet.tcp.keepidle: 7200000 -> 5000

net.inet.tcp.keepintvl: 75000 -> 1000

server starting

shutdown: [pid 5319]

Shutdown NOW!

*** FINAL System shutdown message from nsroot@ctadm02 ***

System going down IMMEDIATELY

Shutdown NOW!

System shutdown time has arrived

Message from syslogd@ctadm02 at Jul 30 13:18:41 ...

<auth.emerg> ctadm02 init: Rebooting via init mechanism

On a Normal day I was sitting and relaxing, Got a P1 call from one my customers, stating High urgency and business critical. We got onto the call. Below are the details

Issue Reported : Connections are getting dropped by ASA from outside and other interfaces

Initial Log Provided : It was only a packet-tracer from ASA

pri/act# packet-tracer input dmz3 udp 10.10.21.24 2343 192.168.100.74 111 detailed

Phase: 1
Type: ROUTE-LOOKUP
Subtype: input
Result: ALLOW
Config:
Additional Information:
in 192.168.0.0 255.255.0.0 inside

Phase: 2
Type: ROUTE-LOOKUP
Subtype: input
Result: ALLOW
Config:
Additional Information:
in 10.10.21.0 255.255.255.0 dmz3

Result:
input-interface: dmz3
input-status: up
input-line-status: up
output-interface: inside
output-status: up
output-line-status: up
Action: drop
Drop-reason: (conn-limit) Connection limit reached

Hardware ASA : ASA 5520 running 8.2 version of code

Troubleshooting Steps/ Strategy
We started questioning and finally we come to a last statement that connections for specific source and destination are getting dropped.

as per the drop reason , we started with looking at resources for the ASA
1. show resources usage all

Resource               Current        Peak      Limit        Denied Context
SSH                          0           5          5            44 System
Syslogs [rate]              77        1804        N/A             0 System
Conns                     4156        9817     280000             0 System
Xlates                    2912        8400        N/A             0 System
Conns [rate]                62         984        N/A             0 System
Inspects [rate]             13         584        N/A             0 System

Could not be the reason , as all were with in the limit.

2. Checked CPU and memory

CPU was only 31%, Memory was only 61 %

3. Checked Static Nat configuration, where connection limit might have been defined for the specific connection.

There was not static nat or any nat which defines the limit , limit was only defined for UDP connections upto 5, but for TCP it was 2000.

4. Check ASP drop usage

show asp drop

asp drop shows the packet count incremented which are getting drop because of connection limit

5. It took more than 3 hours to check all the statements , but still no clue for the connection limit. But for sure it was not a hardware-limitation.

6. Finally we found that it was because, in ASA static nat statement if we define the connection limit, it does not take both source and destination into consideration , but connection limit in NAT works only with consideration to source.

and we found a statement where a /24 subnet was limited to number of connections.

Conclusion : In none of documents, technical documents it has been specified that, if we have done a connection limit in static statement , it does not match both source and destination , but it only matches the source and does a connection limit.

Recommendation : Please consider MPF/ Policy-map to limit the number of connections, where it could not be included in static statement.

Life is all About issues and solution

Sunday, 4 August 2019

Citrix Application Delivery management (ADM / NMAS) - Database sync Issue

Friday, 11 August 2017

ASA : Drop-reason: (conn-limit) Connection limit reached