SpyBot Blocks CasaleMedia
Sometimes software does the darndest things…and it can drive you crazy. That’s just what happened today when I attempted to login to my publisher account at CasaleMedia.com. Oddly, enough my requests to the website were being redirected to the webserver running on my personal computer. Being a Windows XP box, I issued the following command to bring up the hosts file:
C:\WINDOWS\NOTEPAD.EXE C:\WINDOWS\SYSTEM32\DRIVERS\etc\HOSTS
Much to my surprise, I found a HUGE list of sites added by SpyBot Search & Destroy with casalemedia being one of them. They are redirecting these sites to the computer owner’s PC as a means of blocking them. I’m not sure why Casale was added to that list but here’s a small excerpt:
127.0.0.1 www.casalemedia.com 127.0.0.1 casalemedia.com 127.0.0.1 www.cashdeluxe.net 127.0.0.1 cashdeluxe.net 127.0.0.1 www.cashengines.com 127.0.0.1 cashengines.com 127.0.0.1 cashsearch.biz 127.0.0.1 www.cashsurfers.com 127.0.0.1 cashsurfers.com 127.0.0.1 www.CashUnlim.com
Small Biz Fights Big Socialism
Sometimes people just don't get it. They have forgotten their history lessons and somehow feel that this time around it will be possible to have both socialism and a thriving economy. Fortunately, New York small business owners are reminding state lawmakers to reject plans that would extend family leave and increase the minimum wage.
The National Federation of Independent Business, a Washington-based advocacy group, conducted a survey of 11,000 members of its New York chapter and found that 92 percent opposed the family-leave measure, which would extend paid employee absences to 12 weeks. 83 percent opposed the increase in minimum wage.
Critics believe these measures will prevent small businesses from creating more jobs, especially in light of the current mortgage crisis.
My rant: 3 months of paid time off! WOW! I do not know anyone that wouldn't LOVE for their employer to pay them for 12 weeks off from work, but it's really not the responsibility of the business to finance what you do at home. It is amazing how the government so joyously pushes off it's socialist agenda on businesses. Whether or not you believe the government should be responsible for providing for the people, we should all agree that businesses should NOT be responsible for this. And yet, for most Americans, it is the employer that provides them with health insurance, facilitates their retirement, pays for them if they become unemployed, pays for schools and other taxes unrelated to the business itself, and pays for family leave. When did it become alright for Small Business to be the whipping boy of Big Socialism?
Developer Nirvana - Amazon SimpleDB
Finally something worth talking about! I love it when Amazon sends me their webservice emails because they always seem to be cooking up something clever that’s going to make my job easier. First it was S3, then EC2, and now SimpleDB. In their own words:
Ahhhh. Doesn’t that sound nice? No software to maintain. No db server to maintain. No data modeling. No need to monitor storage space or server load. If this sounds good to you, check out the full page.
Blog Admin Nazis and Link Love
In an attempt to fight the evil legions of spammers that threaten to descend upon blogs everywhere, some bloggers have taken up arms and become what I endearingly refer to as "Blog Admin Nazis". Despite the "U Comment, I Follow" badges that adorn their pages, these warriors will not hesitate to decapitate your comment with one swift click of the mouse.
But is this fair? Is this in alignment with the "Do Follow" movement? Certainly spam needs to be deleted. Little one line comments like "Loved this post!", or "I agree with Tom!" certainly deserve to vanish. However, my understanding of the “Do Follow” concept was that people who take the time to write a thoughtful post on a dofollow blog have earned the right to a link. Naturally, I’ve been surprised to visit some blogs that proudly display the “U Comment, I Follow” logo, and when I take the time to write out a thoughtful, on-target post, they remove my link, but keep my post. To me that is the ultimate insult. I’d rather them delete the post entirely then to keep my contribution, but deny me a link.
There you have it. You will find no Blog Admin Nazis on this site. Nothing but link love for those who take the time to write a thoughtful comment.
Rails Performance Diary
Although this blog post is titled Rails Performance Diary, it actually has more to do with SQL. For many websites, much of the work is being done by the database server. Such is the case for a simple dog site that I am working on. Having done quite a few websites using Ruby on Rails, I was surprised to find this simple site giving me some major performance problems. Pages were loading slow and sometimes timing out and there was no traffic to the site, other than the developers.
Fortunately, the Rails log outputs all of the queries by default, which is always the first place that I look. And here's what I found -- 2 queries:
1) SELECT * FROM breeds
2)
SELECT breeds.id AS t0r0, breeds.name AS t0r1, breeds.alias AS t0r2, breeds.lookup AS t0r3, breeds.info AS t0r4, breeds.line_breaks AS t0r5, breeds.has_pics AS t0r6, breeds.has_videos AS t0r7, pictures.id AS t1r0, pictures.breed_id AS t1r1, pictures.original AS t1r2, pictures.large AS t1r3, pictures.medium AS t1r4, pictures.thumbnail AS t1r5, pictures.caption AS t1r6, pictures.email AS t1r7, pictures.notes AS t1r8, pictures.disabled AS t1r9, pictures.order_field AS t1r10, pictures.created_at AS t1r11, pictures.filename AS t1r12, pictures.breed_lookup AS t1r13, videos.id AS t2r0, videos.breed_id AS t2r1, videos.title AS t2r2, videos.description AS t2r3, videos.code AS t2r4, videos.created_at AS t2r5, videos.orig_url AS t2r6, videos.embed_src AS t2r7, videos.thumbnail AS t2r8, videos.time_length AS t2r9, videos.length_seconds AS t2r10, videos.youtube_id AS t2r11, videos.lookup AS t2r12 FROM breeds LEFT OUTER JOIN pictures ON pictures.breedid = breeds.id LEFT OUTER JOIN videos ON videos.breedid = breeds.id WHERE (breeds.lookup = 'airedaleterrier' )
The next step was to run each of these queries in the MySQL Query Browser, so I could see how long it took the queries to run. The first query took 13 seconds to execute, so I felt that I had found the problem. Much to my chagrin, however, I found the second query to take over a minute!!! Clearly, neither one of these queries are acceptable, even with caching.
The First Query : SELECT * FROM breeds
This query was being used simply to get a list of the dog breeds so that I could link to each of their info pages, yet the query was returning the entire record, which includes a long "description" column containing a lot of text. This query was generated by ActiveRecord from the following code:
@breeds = Breed.find(:all)
To solve the problem, let's force a specific query that doesn't include the description field:
@breeds = Breed.find_by_sql("SELECT id, name FROM breeds")
Immediately, the query dropped from 13 seconds to a fraction of a second.
Second Query: SELECT breeds.id AS t0_r0, breed....
Here was the stumper. This vast expanse of SQL was generated from the following Ruby:
@breeds = Breed.find_by_lookup(lookup, :include => [pictures, videos])
Typically, including relations speeds things up because it allows an SQL join to be used rather than multiple SQL queries. But, in this case, the join only reduces 3 queries to 1 query and incurs a large overhead to do so. Join tables multiply the number of records for each table joined, so in this case the number of results was extremely large. Now, I'm sure a veteran database wizard could enlighten you with SQL magic to solve this problem. But, not being such a wizard, I simply broke the query into 3 separate queries as shown below:
@breeds = Breed.find_by_lookup(lookup)
@pictures = Picture.find(:all, :conditions => "breed_id = #{@breed.id}")
@videos = Video.find(:all, :conditions => "breed_id = #{@breed.id}")
Voila! Each of the 3 queries executes in less than a second!
Note: although it's not mentioned above, I also added appropriate indexes to each of the database tables to speed up the queries. These did not help much, but are always a good practice.
CouchDB: No Threat to RDBMS
Eventually I plan on getting around to doing some in-depth Erlang programming. Until then, it's nice to read up on current projects like the much-hyped CouchDB. According to the CouchDB website...

CouchDb is
- A document database server, accessible via a RESTful JSON API.
- Ad-hoc and schema-free with a flat address space.
- Distributed, featuring robust, incremental replication with bi-directional conflict detection and management.
- Query-able and index-able, featuring a table oriented reporting engine that uses Javascript as a query language.
...And although they claim it is not a replacement for RDBMSs, there are people who think it is. I have to say, this project looks very promising. I can envision lots of uses for it in which it would simplify apps that just want to quickly save some data. Or apps that are storing lots of "objects" that all differ slightly. Perhaps aggregating content from multiple sources. I love stuff like this that get you to think about something in a whole new light.
But, here's the problem. When a nifty project like this becomes over-sold and pushed beyond its expectations, I can't help but put on my critical hat. The problem is fueled by the fact that there are lots of Erlang programmers out there that are itching to see Erlang take off, so they'll gladly prop up an Erlang project with inflated claims. Let's take a look at some of the reasons why CouchDB will never be a threat to RDBMSs and some of the fallacies being spread.
No fixed columns = less ability to optimize
It's been a while since my database theory class in grad school, but I seem to recall discussion about the fixed structure of database columns as a key ingredient in achieving high performance. Knowing the distance that each record takes up in a binary file allows the data to be read much faster. And, extracting the necessary record(s) from file is where most of the time is spent for a given query. Because CouchDB takes a "no fixed structure" approach, then it seems that it will lose to an RDBMS in disk access, which is the most time-consuming operation.Bidirectional replication and peer setup = extra overhead or transactional issues
Replication is not my area of expertise, but I do know how to count and how to think, so let's do some basic math and logic. For simplicity, let's pretend we have 4 beefy db servers. In configuration A, we have 1 master server and 3 slave servers. Whenever there is an update it takes place on the master server and the changes are sent to the 3 slave servers simultaneously. Thus, 1 update = 3 messages.
In configuration B, we have 4 server nodes connected in a peer-to-peer fashion. If the machines are each connected to no more than 2 other machines, then messages will have to travel along mutliple hops, which would increase the number of messages and the time it takes for the message to be broadcast to all peers. So, instead, let's assume that each peer is connected to every other peer. Now, 1 update = 3 messages, but there is a different problem that has cropped up. Any peer wishing to update a record will have to check with every other peer for changes first before it can safely do so. Or, peers will have to be assigned as authoritative for subsets of the global data. I'm not saying the P2P way of doing things is bad. In fact, I think it allows for the system to scale rather easily. But, we have to acknowledge the performance and transaction issues that come with this approach.No tables = less structure = worse performance
Every means of organizing data also leads to performance optmizations. Storing data in tables is an additional means of classifying the data, which allows for faster extraction later. CouchDB's table-less approach will be great for allowing people to easily store and retrieve things because they don't need to think about tables, but there will be a performance hit.Transactional weakness = no enterprise use
Ooops. I said the 'e' word. I know, I know. I don't like when people throw around the word "enterprise" in a demeaning way. Like "Your Ruby is not as enterprise-ish as my Java or .Net or C++". But, sometimes when the shoe fits...Personally, I would like to know that Fidelity Investments is safeguarding the integrity of its data and every transaction that takes place so that my account balance is accurate. Here's my favorite scary quote on this issue:
"Features like referential integrity, constraints and atomic updates are really important in the client-server world, but irrelevant in a world of services."
You're kidding, right? So, in a world of services, you don't need transactional support across a number of operations? And referential integrity is for the birds? That's fine if CouchDB wants to skimp on transactional support and use optimistic locking to attempt to boost performance, but you can't have your cake and eat it too. That may work for some apps, but it's not going to fly for ones in which the data and the operations on that data are critical to the business and stakeholders.C is faster than Erlang
I don't program in C because it takes too long and is way too tedious. But you better believe the workhorse of my web apps (usually a database server) is going to be programmed in C. As mentioned previously, CouchDB is programmed in Erlang while all the major database players use C. I don't think this point needs to be discussed further.
I know this post is coming across a little hard on CouchDB, but it seems there needs to be some balance in the discussion. This seems like a great project, but lets acknowledge that it has some strengths and some weaknesses. It's a great concept, but as the documentation states, it's not a replacement for relational databases.
Votan Website Marketplace: Re-Opens for Business
The VotanWeb website marketplace has re-opened and is now FREE to buyers and sellers! There are still paid packages available for those that are interested, but we wanted to open up the majority of the site at no cost to YOU. Even the Premium Listings for sellers are free to those sites that have an Alexa traffic rank of less than 250K.Whatever Happened to Less Code?
When I heard about the C# Application Markup Language, CSAML, (http://www.charlespetzold.com/etc/CSAML.html) I was a little intrigued by the idea of declaratively being able to program in C#. HTML is a good example of how markup can be more concise than code, while allowing people of all degrees of skill create web pages.
Then I saw the Hello World application…IMHO, 10 lines of code for a C# Hello World app is too long, but the CSAML Hello World app came in at 20 lines of ugly, bloated XML. How is this an improvement????

