Why My Sites Went Dark

Tales of the Aggronaut circa May 2010

When Tales of the Aggronaut was born, I was in the fortuitous situation of knowing someone who worked for a web hosting provider. It was someone I hung out with on the regular and even occasionally raided with, so it was super simple for me to ping them via Google Chat when I needed help with something. However things happened and that friend left the web host, which caused me to see significantly degraded support and performance. It was around this time that they also went through a round of restructuring which saw me not only paying more for my hosting plan but also having to swallow bandwidth and disk space usage overages.

The final straw that broke the camels back is that I went through a period where every morning the hosting environment would be too unstable for me to actually publish content. The problem with the arrangement with my original host is that when they first started out years ago they saw themselves as a multi tier hosting environment that would eventually play with the big dogs. However as time progressed they put more of their effort into Colocation and Disaster recovery data centers and effectively stopped selling any of the hosting plans. I only got to stay on the service because I had been with them so long at this point, but they were theoretically unwilling to offer much in the way of real support.

I ultimately made the jump when I needed to find hosting for the church my wife attends. Since we would be footing the bill for it, I wanted to find some place cheap, reliable and without the spectre of bandwidth overages and disk usage caps. I had used bad experiences with Arvixe and InMotion various work environments and after a bit of back and forth I finally landed on BlueHost. After seeing how generally good the service was I went through the process of moving my main sites over Aggronaut.com and Aggrochat.com, and for the most part everything has been pretty cromulent. Occasionally there would be some stability issues, but it would always resolve itself quickly.

On May 19th that was not the case and I experienced what felt like a server outage. Between the hours of roughly 6 am to 10 am my sites were experiencing intermittent connectivity issues. They alternated between loading fine and getting an assortment of Connection Reset, Connection Closed and Connection Timed Out errors in the browser. At this same time the CPanel associated with the environment my sites were all hosted on was exhibiting the same behavior telling me that this was a server level thing and not a specific problem with my sites.

This lead me through a Dantes Inferno-esc series of useless web chat sessions for two hours while this was going on. I ultimately went through four or five different chat agents as one would mysteriously get disconnected and another would immediately join in their place… causing me to ultimately start from square one in explaining my case. Finally the last of these said that they would escalate the issue to the server techs and that I should receive an email back explaining what was wrong. Within 15 minutes of ending this chat, the waters had calmed and I assumed that something must have been happening on their end ultimately putting it out of my mind.

Yesterday morning however I woke up to the nightmare scenario of seeing these littering my inbox. Sure enough when I hit any of my pages it threw a 404 error. I was completely dark to the world. I began freaking out, connected into CPanel and it seemed as though all of my files were still there and fine, and it was only after this that I noticed another email sitting in my inbox that arrived about 2 in the morning.

While reviewing your account regarding your website being down.  I noticed a few of your file permissions were strange and there was quite a bit of added files that were not named in a way that one would expect. This prompted me to run a malware scan. When the scan finished, it had found compromised files in your account, and in order to protect you, your website visitors and the stability/reputation of the server, we have temporarily suspended your account until the malware has been cleaned. 

This is just a snippet of the email, but it gets to the important part. Immediately following this is an attempt to upsell me cleaning services through some group that they partner with advisement that there was now a “Malware.txt” file sitting in my site root that would explain what the scans found. My mind immediately went to the worst, because I have had to manually clean a site before. It is a slow and tedious process, but also I remembered that two days earlier I had gone through all of the processes to look for any possible signs of infection. I scoured directories looking for any new files that shouldn’t be there, so I was pretty certain my site was clean.

I download the text file and open it to see this. I feel like at this point I need to back track a bit to explain something. Having gone through the nonsense of having to manually clean a site before, I have both an Application Firewall and an Malware Scanner built into my WordPress installation. I regularly receive emails talking about various threats being blocked. What happens in this case is that the file gets dumped into a quarantine directory for me to inspect at my leisure and this directory is walled off from the world with a .htaccess file. These were the strange file permissions that the tech was talking about and these were the supposed Malware files that they were referring to.

I was furious that they ran a scan, saw something… and immediately jumped to shutting off my account without making any attempt to interpret the resulting log files. There are lots of application firewalls available for WordPress at varying price points, and they all effectively do the same thing. There is a quarantine directory that intercepted files go to so that you can clean them if needed and restore them back to your site. This is something that a hosting environment should have encountered before, and they in fact offer similar tools that you can pay extra to have installed on your sites. Someone ran ClamAV, saw that there was a non zero number of files reported and jumped straight to account suspension. To make matters worse, the “rate how I did” link included in the email did not work and told me that it had timed out already… for an email I had only received 5 hours prior.

This once again lead me to return to Web Chat support, which was as bad as I was two days prior. Effectively I was left with the answer of that it would take at least 24 hours for someone to come along, scan my site again, and restore it. After ranting to fellow bloggers on Twitter about this for awhile, I finally decided to try one last avenue… contacting the Twitter based support account. It claimed to be staffed from 7 am EDT to 2 am EDT, so I figured it was worth a shot. It took about an hour for someone to respond, but what transpired after that was a series of DMs with the main Bluehost account. I was surly as hell at this point because all I wanted to do was scream obscenities… but reading back through my responses I mostly kept my shit together.

I managed to get them to rescan my directory, and got back a messsage saying that I still had infections. This lead to what was ultimately the most hostile reply that I made during the entire exchange. “Seriously? Those went into the trash when I deleted them from before… do you not review these at all?” Apparently when I deleted all of the quarantined files earlier, they automatically went into a .trash folder in the root of my account. So now ClamAV was detecting them there, which again if someone were actually looking at the results other than checking for a non zero number, they would have been able to interpret that.

Once I regained some composure, I made my way back into CPanel File Manager and emptied the .trash folder and got the support agent to scan my account again. To be fair, once I shifted away from using the Web Chat and over to Twitter support, the experience was fairly smooth all things considered. There is an inherent awkwardness of the delay built into asynchronous communication but I figure for all future questions I will go straight to the twitter option. The account was scanned once again and finally ClamAV returned a clean bill of health. The final step involved a frustrating sequence of me firing off authentication tokens via DM because again the delay of the medium caused the tokens to time out before the agent could verify them.

At this point the Support Agent flipped a switch and restored everything back to its previous working condition. I still have no clue why their servers were unstable two days ago, and I likely never will. It was an extremely frustrating experience, but in the end once I switched over to Twitter support I managed to get a resolution. Bluehost is for the most part a fine hosting environment, and I have had more good luck with it than bad. However the support from the Web Chat is fairly abysmal. Any issues I have had before seem to magically resolve themselves shortly after getting off chat with someone who claims that they can’t help me.

At the end of the day however you get what you pay for, and I am not paying much for this hosting so I guess I am getting the overall support I deserve. The twitter folks though are on point and I will be relying on them as my primary vehicle for support from this point forward. I could say “Well I Never!” in a huff and transfer hosting providers… but that is such a colossal pain in the ass to do so and I am super lazy. For now I am just going to move on having vented about this tale and call it good.

5 thoughts on “Why My Sites Went Dark”

  1. I almost always choose the Twitter support option if a company offers that. So far, for several post/package delivery services as well as an electronics company, this has worked out fine! To name the last example: We had bought a new washing machine which got delivered but it had several dents in places where it shouldn’t have them. Of course, we told the company and asked them to replace it. On the support hotline, we got several weird and contradicting statements. Only when we contacted the support on Twitter did we get accurate information on what was happening and when we’d get a non-damaged washing machine (the delivery company picked up the dented one, then they kept it in their storage for several days and only when it was back with the company did they initiate (!) sending us a new non-dented one which took several days again until they finally managed to ship it off – in the meantime, we were told several lies over the support hotline including them saying that they contacted their law department to see if our claims were legit – and yes, they were. Customer rights etc. We had ordered a new washing machine, they delivered a broken one, we wanted the new washing machine that we had paid for including delivery that we had also paid for!).

  2. I’ll be honest – this is why I host my own sites on a virtual server. After having one hosting form or another mess me around (Fasthosts, 1&1, etc.), I went with a VPS on Linode. Yeah, I had to buff up on my Linux knowledge, but it means I’m in complete control (and pay a flat rate for as many sites as the box will run). I’d never go back to the old style again.

    • Do you mind if I ask (ballpark) what you pay at the moment, Gazimoff? I started looking into some options for hosting on a cloud service as opposed to a more typical shared host setup as well recently. I started looking into AWS but the (potential) cost scared the bejeebus out of me in fairly quick order.

      I’m using SiteGround at the moment, and for the most part I’ve been happy with their service and support. They’ve fixed some fairly technical issues for me in pretty quick order in the past, my main complaint now is their fairly low limits on CPU seconds/utilisation.

      And they’re not the cheapest to start with, so the next tier of plan is just out of the question in the longer term. I’d be OK with paying a LITTLE more however, once my prepaid time is up.

      More generally — glad to hear you got your site issues sorted, Bel! Sorry you had to run through all that to do so though!

      • I pay $10 a month for a base Linode, which is powerful enough to run a handful of sites, and an extra $5 for daily backups in case I screw something up. I have other linodes which I use for other things, but this is sufficient for an ubuntu-nginx-php-mysql stack that serves a bunch of sites.

Comments are closed.