Thursday, September 4, 2014

Juniper IDP Fix

After upgrading Juniper SRX (branch versions) from 11.x to 12.x, IDP gets stuck in a very weird state. It will start detecting false positives as well as fail when updating.

To fix (for clusters):
configure 
set system processes idp-policy disable 
deactivate security idp 
commit and-quit

Then remove the IDP directories (running "request security idp storage-cleanup" will NOT fix the issue, you need to force-remove the old files and stuck policies in there):  
start shell 
cd /var/db/idpd 
rm -r * 
exit

 Repeat the commands on the secondary node:
 request routing-engine login node 1 
cd /var/db/idpd 
rm -r * 
exit

Check cluster status to make sure the redundancy groups are all on one node:

>show chassis cluster status
 Cluster ID: 1
Node Priority Status Preempt Manual failover

Redundancy group: 0 , Failover count: 1
node0 100 primary no no
node1 1 secondary no no

Redundancy group: 1 , Failover count: 1
node0 100 primary no no
node1 1 secondary no no


Reboot the secondary node:

request system reboot node 1

Go get coffee while it comes back up... (keep checking the cluster status until node 1 shows "secondary" and not "lost").

After its back, failover to the secondary node:

request chassis cluster failover redundancy-group 1 node 1
request chassis cluster failover redundancy-group 0 node 1


Reconnect if you are SSH'ing into the device for management and check cluster status.

Once node 1 shows primary on both redundancy groups, reboot node 0.

request system reboot node 0

Go to the restroom while you wait for it to come back online... (keep checking the cluster status until node 0 shows "secondary" and not "lost").

Once they are both up and in their primary/secondary states, reset the failover.

request chassis cluster failover reset redundancy-group 1
request chassis cluster failover reset redundancy-group 0


Configure to reenable IDP:

configure
delete system process idp-policy disable
commit and-quit

At this point, sometimes the update will fail. You can either reboot the nodes AGAIN doing another failover or try issuing "request security idp storage-cleanup downloaded-files".

Download the full IDP update: request security idp security-package download full-update

Check the status when its finished.

If it DID NOT sync ok, you will have to manually copy over the files to the secondary node and move them to the right directory:
start shell
rcp -T -r /var/tmp/sec-download/* node1:/var/db/idpd/sec-download/
mv /var/tmp/sec-download/* /var/db/idpd/sec-download


Install/update IDP: request security idp security-package install

Check the status. When its done, it should look something like:

request security idp security-package install status

node0:
--------------------------------------------------------------------------
Done;Attack DB update : successful - [UpdateNumber=2415,ExportDate=Wed Sep 3 18:26:00 2014 UTC,Detector=12.6.160140626]
Updating control-plane with new detector : successful
Updating data-plane with new attack or detector : not performed
due to no active policy configured.

node1:
--------------------------------------------------------------------------
Done;Attack DB update : successful - [UpdateNumber=2415,ExportDate=Wed Sep 3 18:26:00 2014 UTC,Detector=12.6.160140626]
Updating control-plane with new detector : successful
Updating data-plane with new attack or detector : not performed
due to no active policy configured.


If you use policy templates, then make to rerun the download and install the templates: request security idp security-package install policy-templates


Re-Enable IDP:

configure
activate security idp
commit and-quit



And verify its running:

show security idp policy-commit-status
show security idp status


Tuesday, April 8, 2014

Fixing Juniper SRX VPN Issues for "KMD_INTERNAL_ERROR: Error:File exists in adding SA config for tunnel id xxxxxx spi 0"

If you have funky issues where your tunnels refuse to connect and a "show security ike security-associations" is showing DOWN with a responder cookie of 0000000000000000, check your kmd log. If you see any entries with this obscure message: "KMD_INTERNAL_ERROR: Error:File exists in adding SA config for tunnel id 666666 spi 0" then read on... To fix this issue, you have two options: 1) Reboot (if its in a cluster, reboot them both simultaneously) or 2) Edit the config: Completely remove the dead/broken st0.xxx interfaces out of the config (including all references to it in the SECURITY IKE, IPSEC, and ZONES sections). Do a "commit full", wait for it to finish. Then rollback the config to before you removed the interfaces (in config mode, its "rollback 1"). Afterwards the VPN tunnels will miraculously come back to life on these horribly buggy firewalls.

Friday, November 22, 2013

Juniper SRX Dual-ISP w/redundant VPNs by Example

Preface: I created this for the poor souls out there who purchased a Juniper SRX and realized how utterly complicated and how miserable the documentation is for configuring these firewalls. I would never recommend purchasing these... there are easier and far more stable firewalls out there that can do the same thing as these horrible devices.  Now for those who are stuck with them and need a quick guide on how to get these muther's working, read on.  

Please be aware that I whipped this rather quick, so I'm sure there are some errors here and there.  I'll update it as necessary.  Also, if there are better ways to configure this or more optimized methods, please don't hesitate to comment!

Moving on.. here's a summary of each site:


Site A: 2x SRX 220's running in a cluster with a dual-ISP setup. It is also running an SMTP server that is accessible on both the primary and secondary ISP's. It has IP-Monitoring so if the primary or backup ISP goes down, traffic is rerouted automatically.

Site B: A single SRX w/IDP running.

Between the sites, VPN connections are setup over every ISP.  OSPF is running for redundancy and to take care of all the static routes.  All SRX's are running 11.4 or later.






Configs


Thursday, October 11, 2012

VMWare ESXi 4/5 APD Lockup Problem

Problem: You click Rescan All... in the VSphere client and the ESXi host becomes unmanageable due a dead LUN or downed path of offlined volume (this is for iSCSI, I dont know about any others if this problem still happens).  Only fix is to hard-reboot the server.

Despite this long and lengthy from VMware on how to do this cleanly (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2004605) it still is prone to a lot of errors and most likely this will not work for your environment.  I'm not even about to connect to 6 different hosts and run all this nonsense to make sure VMware cleanly unmounts a volume.

The quick fix: Go to your Storage Adapters and click on the properties of iSCSI Software Adapter.  Click the Static Discovery tab.  Remove the dead connections.  Then you can rescan without the host locking up.  No other method has proven reliable for me other than this.

Update: this still didn't fix the issue.  The only real way to overcome this problem is to upgrade to 5.1 where they finally fixed the issue.

Wednesday, September 19, 2012

Sophos (Shh/Updater-B)

Before I go into my rant on how fucked up Sophos' Endpoint Protection management system is, let me run through how I fixed this problem with the false detection.

My policies were set to move malware if cleanup failed.  Fortunately, only a handful of computers actually were able to move some of these files before I was able to update the policy.  Also fortunate that my antivirus server was NOT running the Sophos Client so nothing on the server broke.

Immediately I added these Windows exclusions to all on-access policies:

C:\Program Files (x86)\Sophos
C:\Program Files\Sophos
C:\ProgramData\Sophos

Also changed the policies to blocked instead of move at this time.

Forced all clients to update the policy.  Next I forced the update manager to grab the fix.  Let it push out to our file server which eventually synced it up using DFS to all locations.

Now the fun..   for the clients that didn't break themselves, I let them self-update and this fixed the issue.

For the clients that did quarantine/delete Sophos' own update files (grabbed a list of them by checking which computers were not fully up to date in the console), I copied the entire C:\Program Files (x86)\Sophos\AutoUpdate directory as well as the update definition that fixes this false detection to our server.

Ran a quick script to copy back the programs and dll files Sophos removed:
xcopy "\\Server\SophosAutoUpdate\*.*" "C:\Program Files\Sophos\AutoUpdate\*.*" /y
xcopy "\\Server\SophosIDE\javab-bd.ide*" "C:\Program Files\Sophos\Sophos Anti-Virus\*.*" /y
(change to "C:\Program Files (x86)" for 64-bit OS)

Restarted Sophos Anti-Virus service (SAVservice) and Sophos Updater with Dameware and I was back in business.

For clients who are not connected, I think the best bet is to send them a script to stop Sophos Anti-Virus, then have them run an executable ZIP file to restore the AutoUpdate and java*.ide file.  I cannot imagine how larger corporations are dealing with this disaster.  For the company I am at, I was able to catch this issue when the first round of alerts started flooding in.  And luckily since we use DFS to distribute updates, I used "Previous versions" to restore back the files that were modified before the last update to stop the spread.

---

Now the rant...   Sophos' support on this problem was beyond horrible.  How this update slipped through QA is unforgivable.  In fact the only way this could have slipped through their quality control is if they didn't have quality control or testing.  Otherwise, they would have realized this update breaks their own program!

I understand some other antiviruses released bad updates, but NEVER have I ever seen one that actually detected itself as a virus.

How did they respond to this fuckup?  They issued 1 advisory which was so vague and would not fix anyone's issue unless their policy was changed to do nothing when Malware was detected (which I don't even think is their default setting).  Their support lines were unreachable from the massive number of customers calling in, their email support was non-existent, and it appears the only help available was 1 employee responding periodically on their forums.

Regardless of this recent incident, there was numerous other annoyances that aggravated me:
1. You cannot unquarantine files remotely.  You had to manually go the client computer and run Sophos from there.  On top of this, quarantine files are not moved back.  You have to sort through the log files and figure out where they came from.  FAIL.

2. Server-Client communication is sub-optimal.  The clients stay connected to the server over 2 ports at all times.  Its not a simple push/pull method, but a constant connection.  Drains server resources, and just an overall poor design that was probably meant for a network of 20 computers, not hundreds or thousands.

3. Version 10 and the bloat.  Their "web-intelligence" services (2 more services it has to run) breaks a lot of network programs.  Disabling in it the policy has no effect, the only fix is to actually set the service to disabled.  It's a broken LSA that destroyed our Sharepoint server (email notifications stopped working, SQL connections broke) and some clients were not able to browse the web or use Oracle applications.


Sophos did have its advantages back in the day, lightest and strongest Antivirus out there.  But I'm afraid its time has gone, they are not improving the product but just bloating it with useless addons -- making it an absolute disaster to manage and maintain.  I'm going to have to take a peek at Vipre, had some issues when testing it years ago but at least it was manageable, where I had the ability to release quarantined files.

Tuesday, April 10, 2012

Red Hat 5, iSCSI and multipath

So you setup multipath with iSCSI on Red Hat 5 but noticing traffic is only going out 1 interface?

The problem is iscsiadm seems to ignore the physical interface you are trying to bond to.  I think you can manually force the iface when setting up each node but when you have a storage array with 5 adapters, and your server has 3 adapters you are using, do you really want to enter 15 commands per volume?  If you are as lazy as I am, the quick fix is to edit each iface entry in /var/lib/iscsi/ifaces and add the line:
"iface.net_ifacename = eth0" where eth0 is the physical interface you are bonding.

Then its simple a matter of discovering the volumes:
 iscsiadm -m discovery -t st -p <iscsi discovery IP>

Add the node to all available ifaces in 1 shot:
 iscsiadm -m node iqn.veryveryverylongname000000666.feedge --login

Check multipath:
multipath -ll

mpath10 (2bfc5148f7267432c5d7ce900ed0e9ff4) dm-2 Nimble,Server
[size=800G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=15][active]
 \_ 147:0:0:0 sdef       128:112 [active][ready]
 \_ 149:0:0:0 sdeg       128:128 [active][ready]
 \_ 148:0:0:0 sdeh       128:144 [active][ready]
 \_ 151:0:0:0 sdei       128:160 [active][ready]
 \_ 153:0:0:0 sdel       128:208 [active][ready]
 \_ 150:0:0:0 sdej       128:176 [active][ready]
 \_ 152:0:0:0 sdek       128:192 [active][ready]
 \_ 154:0:0:0 sdem       128:224 [active][ready]
 \_ 156:0:0:0 sden       128:240 [active][ready]
 \_ 158:0:0:0 sdeq       129:32  [active][ready]
 \_ 155:0:0:0 sdeo       129:0   [active][ready]
 \_ 157:0:0:0 sdep       129:16  [active][ready]
 \_ 159:0:0:0 sder       129:48  [active][ready]
 \_ 160:0:0:0 sdes       129:64  [active][ready]
 \_ 161:0:0:0 sdet       129:80  [active][ready]

and mount (or format) your volume:
mount /dev/mpath/mpath10 /myvolume

You can check ifconfig to confirm all the ethernet adapters bound to iSCSI have equal amounts of traffic or use iptraf to check the packets.

Thursday, November 3, 2011

Removing Yahoo Email Account on an Android

For those getting errors trying to remove Yahoo or any other email account...

1) Enable wifi
2) Connect to a wifi network
3) Restart the phone
4) Leave wifi on
5) Go to delete the account again

The restart step may or may not be necessary.