Elliott C. Back: In Aere Aedificare

FCC Definition for Broadband now 768Kbps

Posted in Performance, Government, Downloads, DSL by Elliott Back on March 22nd, 2008.

According to the FCC, the term “broadband” now means 768Kbps, up from the previous definition of 200Kbps. Under the new definition, “basic broadband” defines download speeds between 768Kbps and 1.5Mbps. Other changes in how subscribers are reported includes a breakdown of upload and download speed and additional gradations of speed. News dot COM notes that “ISPs will not have to report the prices they charge, yet.”

For comparison’s sake, an average movie download is 700 MB (5872025600 bits), and would take 8.16 hours to download under the old broadband definition at 200Kbps. However, at the new faster rate of 768Kbps, an American with basic broadband will be able to download a movie in just 2.12 hours.

Broadband reporting is a problem for America, because up till now we could only point to useless studies indicating that 12.5% of internet users are still on 56.6k or worse speeds. Once politicians and the industry realizes how bad broadband penetration in the US really is, we’ll see better internet service and connectivity.

Ruby vs PHP Performance Revisited

Posted in Code, Web 2.0, Scalability, Performance by Elliott Back on January 17th, 2008.

Ignoring any of Hongli Lai’s actual code, I reran the PHP, Ruby, C++, Perl, and Python mergesort benchmarks he gave, and came up with substantially different results. Here are the versions of the programming languages I am using for the test:

  • PHP - PHP 5.1.6 (cli) (built: Sep 18 2007 09:07:28)
  • Ruby - ruby 1.8.5 (2007-09-24 patchlevel 114) [x86_64-linux]
  • Perl - This is perl, v5.8.8 built for x86_64-linux-thread-multi
  • Python - Python 2.4.4 (#1, Oct 23 2006, 13:58:18)
  • C++ - gcc version 4.1.2 20070626 (Red Hat 4.1.2-13)
  • Java - Java(TM) SE Runtime Environment (build 1.6.0_10-ea-b10)

You’ll notice I’m adding Java into the mix for fun. Here’s the results, over 10 runs, on an Intel Dual-core 1.80GHz machines with 2Gb of RAM currently running this website:

mergesort-performance.png

Lang	Average	Min	Max
PHP	8.8325	8.637	9.303
Ruby	7.2896	7.143	7.729
Perl	4.3231	4.262	4.428
Python	3.3465	3.289	3.417
C++	0.5638	0.53	0.609
Java	0.4062	0.262	0.551

There are a couple important conclusions to note here that are significantly different than Hongli Lai’s:

  • PHP is 21% slower than Ruby, not 41% as in his benchmark
  • Python is 29% faster than Perl, not 17% as in his benchmark
  • Java runs this 39% faster than C++, and 2100% faster than PHP

So, PHP is slower than Ruby, but not quite as slow as Hongli Lai would have you believe. Python is the fastest scripting language in this benchmark, while Java is the faster language all around, and is incredibly, incredibly fast. Maybe all of our code should start using java!

* NOTE: I am ignoring the obvious deficiencies of this micro-benchmark and just trying to reduplicate it. What I’ve found is that there are significant discrepancies between Hongli Lai’s run of the tests and my own, probably owing to slightly different versions of the components involved. Also, if I make some trivial optimizations to the loops in the PHP script, I can get it to run faster than everything but C++, in about 2.4s. Then again, just calling sort() is faster by another two orders… but still half as slow as Java’s built-in sort… and two orders slower than perl’s built-in.

Benchmarking Wordpress with Apache Bench

Posted in Blogging, Scalability, Performance, WP by Elliott Back on January 14th, 2008.

A lot of people talk about Wordpress performance, and how to get a webserver to perform as efficiently as possible. However, without a quantifiable methodology to testing website performance, you can’t actually talk about it. ApacheBench (ab) is the solution to the problem of measuring website performance. What is ApacheBench? The man page provides a suitable answer:

ab - Apache HTTP server benchmarking tool

ab is a tool for benchmarking your Apache Hypertext Transfer Protocol (HTTP) server. It is designed to give you an impression of how your current Apache installation performs. This especially shows you how many requests per second your Apache installation is capable of serving.

If you have installed apache or apache-devel, you should be to simple invoke ab by typing it on the command line. For example, to benchmark my own site here, I would write:

[root ~]# ab -n 10000 -c 100 http://elliottback.com/wp/

This says “make 10,000 concurrent requests to host elliottback.com via http and request /wp/ on 100 threads.” The result of this is the following report:

This is ApacheBench, Version 2.0.40-dev < $Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking elliottback.com (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Finished 10000 requests

Server Software: Apache/2.2.6
Server Hostname: elliottback.com
Server Port: 80

Document Path: /wp/
Document Length: 34331 bytes

Concurrency Level: 100
Time taken for tests: 13.596345 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 346230000 bytes
HTML transferred: 343310000 bytes
Requests per second: 735.49 [#/sec] (mean)
Time per request: 135.963 [ms] (mean)
Time per request: 1.360 [ms] (mean, across all concurrent requests)
Transfer rate: 24868.08 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 1.6 0 20
Processing: 8 134 12.7 132 190
Waiting: 4 134 12.7 132 190
Total: 16 134 12.1 132 190

Percentage of the requests served within a certain time (ms)
50% 132
66% 134
75% 136
80% 137
90% 145
95% 160
98% 175
99% 179
100% 190 (longest request)

According to these numbers, my dual core server can do 750 requests per second, fulfilling each within about 150ms each. That’s pretty fast, probably because I know the secrets of Wordpress Optimization. If you make every layer as fast as it can be, and cache heavily, you too can see lightening fast Wordpress installations!

Mark Cuban’s P2P Ideas Suck

Posted in P2P, Scalability, Performance, bit torrent, bittorrent, Celebrities by Elliott Back on November 25th, 2007.

In a three-part rant about peer-to-peer technologies (1, 2, 3), Mark Cuban demands that peer-to-peer technologies “die a quick death” in order to”speed up [his own] internet connection.” He suggests that “Google Video is a far better solution for audio and video distribution than any P2P solution” and that cable companies “charge for upstream bandwidth usage.”

Guess what–I already get charged for all the bandwidth I use, either up or down. When Verizon strings a fiberoptic cable to my home, I’m getting a certain amount of fixed capacity into the greater internet at large. If I want to trade a little upstream capacity for greater downstream capacity, that’s my call! Have you ever noticed that downloading over http is typically slow because there are 100s of clients and 1 host? If I download the same information over bittorrent, I can sustain 12Mbs because everyone is a server–including me. Distributed protocols, such as the ones powering Amazon Dynamo or bittorrent, are more efficient, cost effective, and fault tolerant than single-server models.

Reactions around the blogosphere indicate that Mark Cuban’s thoughts on P2P are nonsensical rubbish. Mashable calls him “a guy who does not understand how P2P works, and yet he wants it shut down.” Ars Technica notes that “if users who are currently saturating their connections with BitTorrent start saturating their connections with Google Video content, the end result is more or less the same.” And a slashdotter comments, “Just imagine how fast the internet would be if there were no content to view. After P2Ps gone, get rid of all these freeloading websites, emails, etc. and it will be blisteringly fast.”

My guess is that billionaire Mark Cuban has a slow, shared cable internet connection at home, the modern equivalent of a party line. This might lead him to confuse his own slow internet connection with a greater systemic problem. What he should be complaining about is why Verizon hasn’t strung fiber in his area yet.

Denial of Service Attack (DOS), Grrr….

Posted in My Blog, Spam, Performance, Hacking, WTF by Elliott Back on November 4th, 2007.

Today I had the pleasure of a random guy in Mexico recursively downloading as much of my site as he could, which sent my CPU load to 2.0, a level that Dreamhost would find acceptable but which I personally freak out about. The r-dns and IP of this guy are:

dsl-189-171-15-59.prod-infinitum.com.mx
189.171.15.59

He started at 04/Nov/2007:12:04:36 and ended (by iptables ban) at 04/Nov/2007:20:17:03. In those 8 hours and thirteen minutes, he made over 250,000 requests. That’s an extra 8.5 requests per second from a single IP, which is clearly unacceptable behavior:

[root@fc624389 ~]# cat access_log | grep 189.171.15.59 | wc -l
251923

If you don’t believe me, the next biggest offender over the last 24 hours made only 4,400 requests:

[root@fc624389 ~]# cat access_log | cut -d’ ‘ -f1 | sort -n | uniq -c | sort -nr | more
251923 189.171.15.59
4403 66.249.73.116
2012 76.88.78.239
1646 70.141.105.233

The user agent of this guy doesn’t tell *me* anything about him, but maybe one of you readers has an idea?

189.171.15.59 - - [04/Nov/2007:12:04:38 -0500] “GET /wp-content/themes/greenmarinee/images/links_bullet.gif HTTP/1.1″ 200 467 “http://celebrity-photos.elliottback.com/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Media Center PC 3.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322)”

Another thing that bugs me is he requested each URL about 7 times. WTF? Do you really need to spider my site as fast as you can seven times?

[root@fc624389 ~]# cat access_log | grep 189.171.15.59 | cut -d’ ‘ -f11 | sort | uniq | wc -l
35414

I am either thinking of writing a very evil script to confuse non-google/msn/live/ask/yahoo bots by writing in an infinite number of invisible links into my websites, or installing some kind of mod_throttle into my apache. It looks like mod_limitipconn might help here, too.

Next Page »