Exclude-by-Keyword: Thoughts on Spam and Robots.txt

Note: This solution is for spam that cannot be filtered. There are already wonderful tools to help with comment / forum / wikispam such as LinkSleeve and Akismet. However, this proposed method would prevent the more nefarious methods such as HTML Injection, XSS, and Parasitic Hosting techniques. Truth be told, I rarely use the Robots.txt file. It’s functionalities can be largely replicated on a page-by-page basis via the robots META tag and, frankly, we spend a lot more time on getting page into the SERPs than excluding them. However, after running / creating several large communities with tons of user-generated content, I realized that the Robots.txt file could offer a lot more powerful tools for exclusion. Essentially, exclude-by-keyword. The truth is,...

Ad Blocking is Immoral

After a terrible write up claiming that Ad Blocking is Moral made the front page of Reddit, I felt obliged to respond. First, a brief response. For lack of a better word, (actually, this is pretty much the perfect word), the piece is drivel. It cases the ethics of Ad Blocking (visitor) and Ad Serving (publisher) in terms of effectiveness, relevancy, and business modeling. While these may all be useful arguments of whether a publisher ought to use advertising to generate revenue, it does not create a meaningful ethical statement on whether subverting advertising efforts is moral. Examples: Sarcastic Response: “In other words, people should support bad business models because it’s more convenient for the businessmen.” Not supporting a bad business...

Google’s New Algorithm: if($domain==’wikipedia.org’){$rank=1;}

I decided to take my last look at Wikipedia a little further to see just how much Google had sold out to everyones-favorite-online-encyclopedia. Does Wikipedia really deserve the rankings that it receives? Or, as many suggest, are single Wikipedia entries receiving the benefit of the doubt by Google’s algorithm because of the strength of the domain? While merely anecdotal evidence (I have yet to run a complete check as I did before), I wanted to take a look at an example where Wikipedia recently took the #1 position. I chose the keyword Entrepreneurship This simple test looked at the number of inbound links pointing directly to the page currently ranking in Google. Instead of looking at site-wide links, we look at the page that ranks itself while excluding...

Google Misses Earnings: Is Anyone Surprised?

It blows my mind that forecasters, including Google’s internal forecasts, did not more clearly take into account the impact that their June attack on arbitrage would have on their earnings. Google, bowing to pressures of general advertisers and searchers, decided to bite the hand that feeds it – the masses of Adsense publishers and Adwords arbitrage experts who poured money into Google. While their campaigns were targeted and only directly affected a handful of publishers and advertisers, the ripple effect (damn those ripple effects) spread throughout the industry sounding the bell tone: “if Google will go after Arbitrage, eventually, they will go after Made-For-Adsense”. I would love to see the defection rate of this past quarter compared...

Prevent Harry Potter Spoilers on Digg with Greasemonkey

So, it is inevitable. Some idiots are going to read the last page of the Deathly Hallows and spout out “Harry Potter Lives!” or “Harry Potter Dies” before the rest of us even get home to pick up the book, much less read it like a real person. In anticipation of these spoilers, I went to look for what I could do to exclude stories by keyword when browsing Digg. (this is also very useful for getting rid of paris hilton and iphone stories, which are growing very tiresome). The solution: Digg Washer. This beauty of a Greasemonkey script makes it very easy to hide stories based on keywords. Unfortunately, it has not been updated since the last Digg revision, so you can either make a slight code change yourself (mentioned in the comments on the...