Elliott C. Back: Technology FTW!

WP Super Cache Benchmark

Posted in Blogging, Performance, Plugins, Scalability, WP, Wordpress by Elliott Back on September 28th, 2008.

If you’ve thought about whether upgrading from WP Cache 2.0 to WP Super Cache is a good idea, hopefully this benchmark will convince you. I followed my instructions on benchmarking Wordpress with Apache Bench on four configurations of this blog’s main page to measure performance:

  1. Without any caching plugins
  2. With WP Cache 2.0
  3. With WP Super Cache (no compression)
  4. With WP Super Cache (compression enabled)

wp-caching-plugins.png

The results show that WP Super Cache is a clear winner, performing 225% better than the older WP Cache. Here is the raw data I gathered during the test:

No caching:
Requests per second: 22.81 [#/sec] (mean)
Time per request: 4383.559 [ms] (mean)
Time per request: 43.836 [ms] (mean, across all concurrent requests)
Transfer rate: 613.75 [Kbytes/sec] received

WP cache:
Requests per second: 872.30 [#/sec] (mean)
Time per request: 114.640 [ms] (mean)
Time per request: 1.146 [ms] (mean, across all concurrent requests)
Transfer rate: 23549.46 [Kbytes/sec] received

Super cache (no compression):
Requests per second: 1518.90 [#/sec] (mean)
Time per request: 65.837 [ms] (mean)
Time per request: 0.658 [ms] (mean, across all concurrent requests)
Transfer rate: 41150.81 [Kbytes/sec] received

Super cache (compression):
Requests per second: 1960.39 [#/sec] (mean)
Time per request: 51.010 [ms] (mean)
Time per request: 0.510 [ms] (mean, across all concurrent requests)
Transfer rate: 53108.70 [Kbytes/sec] received

For more tips on how to improve your Wordpress performance, check out Wordpress Performance: Why My Site Is So Much Faster Than Yours. Another interesting WP caching plugin is Batcache, which uses the memcached backend to serve requests out of a cluster of machines’ RAM memory.

How many users does DIGG have?

Posted in Blogging, Quantitative, Scalability, Science, Web 2.0 by Elliott Back on February 3rd, 2008.

When John Graham-Cumming asked the question How Many Users Does Digg Have?, there were a few things he couldn’t tell you, since his data consisted of randomly self-sampled users. Well, with the power of two PHP scripts, we can pull large amounts of user data and form queries. Our first question is how has DIGG grown over time?

digg-users-over-time.png
A graph of 187,054 digg users, randomly plotted against when they joined

This doesn’t tell us much, though, about how many DIGG users there actually are, or how active they are, so I plotted a histogram of the number of times these 200k users’ profiles had been viewed; the answer, unsurprisingly, is not very often in most cases:

digg-profile-views-histogram.png
83% of users had less than 50 profile views

And what about users who are active? How many people are digging stories every day? The answer is very few. I took a sample of 29,225 users from the previous sample (randomly) and used the DIGG API to query for their last digg. It turns out 31% (9125) had never dugg anything! After I removed those, here is the histogram I got:

digg-last-dugg.png
About 15% of Digg users dugg a story in the last week

Concluding thoughts

Digg boasts an official tally of 2.2M users, but at most 20% of them can be considered real, active users. That would bring their user count down to 440,000, far far less than a popular web 2.0 boom child can boast about, and significantly hurting that $300M (or ~$700 a user) valuation that they keep trying to get.

Code Appendix

The {digg user, time joined, digg id, profile page views} information was gathered by the following script:

<?php
    error_reporting
(E_ALL);
    
ini_set(‘user_agent’‘My-Application/2.5′);
    
ini_set(“include_path”“.:/usr/share/pear”);
    require_once 
‘Services/Digg.php’;
    require_once 
‘Services/Digg/Response/php.php’;

    $base ‘http://services.digg.com/users/?appkey=http://example.com&type=php’;
    
$data unserialize(file_get_contents($base.‘&count=0′));
    
    
$total $data->total;
    echo 
“There are $total total users\n”;
    echo 
“ID,Number,Name,Date,Views\n”;

    for($i 0$i 1000$i++){
        
$offset rand(0$total 100);
        
$data unserialize(@file_get_contents($base.‘&count=100&offset=’.$offset));

        $j 0;
        foreach(
$data->users as $user){
            
$page = @file_get_contents(‘http://digg.com/users/’.$user->name.‘/’);

            if(!$page)
                continue;

            preg_match(‘/id=”userid” value=”(\d+)”/i’$page$matches);
            echo 
$matches[1] . “,”;
            echo (
$offset $j++) . “,”
            echo 
$user->name “,”;
            echo 
$user->registered “,”;
            echo 
$user->profileviews .“\n”;
        }
    }
?>

Ruby vs PHP Performance Revisited

Posted in Code, Performance, Scalability, Web 2.0 by Elliott Back on January 17th, 2008.

Ignoring any of Hongli Lai’s actual code, I reran the PHP, Ruby, C++, Perl, and Python mergesort benchmarks he gave, and came up with substantially different results. Here are the versions of the programming languages I am using for the test:

  • PHP - PHP 5.1.6 (cli) (built: Sep 18 2007 09:07:28)
  • Ruby - ruby 1.8.5 (2007-09-24 patchlevel 114) [x86_64-linux]
  • Perl - This is perl, v5.8.8 built for x86_64-linux-thread-multi
  • Python - Python 2.4.4 (#1, Oct 23 2006, 13:58:18)
  • C++ - gcc version 4.1.2 20070626 (Red Hat 4.1.2-13)
  • Java - Java(TM) SE Runtime Environment (build 1.6.0_10-ea-b10)

You’ll notice I’m adding Java into the mix for fun. Here’s the results, over 10 runs, on an Intel Dual-core 1.80GHz machines with 2Gb of RAM currently running this website:

mergesort-performance.png

Lang	Average	Min	Max
PHP	8.8325	8.637	9.303
Ruby	7.2896	7.143	7.729
Perl	4.3231	4.262	4.428
Python	3.3465	3.289	3.417
C++	0.5638	0.53	0.609
Java	0.4062	0.262	0.551

There are a couple important conclusions to note here that are significantly different than Hongli Lai’s:

  • PHP is 21% slower than Ruby, not 41% as in his benchmark
  • Python is 29% faster than Perl, not 17% as in his benchmark
  • Java runs this 39% faster than C++, and 2100% faster than PHP

So, PHP is slower than Ruby, but not quite as slow as Hongli Lai would have you believe. Python is the fastest scripting language in this benchmark, while Java is the faster language all around, and is incredibly, incredibly fast. Maybe all of our code should start using java!

* NOTE: I am ignoring the obvious deficiencies of this micro-benchmark and just trying to reduplicate it. What I’ve found is that there are significant discrepancies between Hongli Lai’s run of the tests and my own, probably owing to slightly different versions of the components involved. Also, if I make some trivial optimizations to the loops in the PHP script, I can get it to run faster than everything but C++, in about 2.4s. Then again, just calling sort() is faster by another two orders… but still half as slow as Java’s built-in sort… and two orders slower than perl’s built-in.

Benchmarking Wordpress with Apache Bench

Posted in Blogging, Performance, Scalability, WP by Elliott Back on January 14th, 2008.

A lot of people talk about Wordpress performance, and how to get a webserver to perform as efficiently as possible. However, without a quantifiable methodology to testing website performance, you can’t actually talk about it. ApacheBench (ab) is the solution to the problem of measuring website performance. What is ApacheBench? The man page provides a suitable answer:

ab - Apache HTTP server benchmarking tool

ab is a tool for benchmarking your Apache Hypertext Transfer Protocol (HTTP) server. It is designed to give you an impression of how your current Apache installation performs. This especially shows you how many requests per second your Apache installation is capable of serving.

If you have installed apache or apache-devel, you should be to simple invoke ab by typing it on the command line. For example, to benchmark my own site here, I would write:

[root ~]# ab -n 10000 -c 100 http://elliottback.com/wp/

This says “make 10,000 concurrent requests to host elliottback.com via http and request /wp/ on 100 threads.” The result of this is the following report:

This is ApacheBench, Version 2.0.40-dev < $Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking elliottback.com (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Finished 10000 requests

Server Software: Apache/2.2.6
Server Hostname: elliottback.com
Server Port: 80

Document Path: /wp/
Document Length: 34331 bytes

Concurrency Level: 100
Time taken for tests: 13.596345 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 346230000 bytes
HTML transferred: 343310000 bytes
Requests per second: 735.49 [#/sec] (mean)
Time per request: 135.963 [ms] (mean)
Time per request: 1.360 [ms] (mean, across all concurrent requests)
Transfer rate: 24868.08 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 1.6 0 20
Processing: 8 134 12.7 132 190
Waiting: 4 134 12.7 132 190
Total: 16 134 12.1 132 190

Percentage of the requests served within a certain time (ms)
50% 132
66% 134
75% 136
80% 137
90% 145
95% 160
98% 175
99% 179
100% 190 (longest request)

According to these numbers, my dual core server can do 750 requests per second, fulfilling each within about 150ms each. That’s pretty fast, probably because I know the secrets of Wordpress Optimization. If you make every layer as fast as it can be, and cache heavily, you too can see lightening fast Wordpress installations!

Mark Cuban’s P2P Ideas Suck

Posted in Celebrities, P2P, Performance, Scalability, bit torrent, bittorrent by Elliott Back on November 25th, 2007.

In a three-part rant about peer-to-peer technologies (1, 2, 3), Mark Cuban demands that peer-to-peer technologies “die a quick death” in order to”speed up [his own] internet connection.” He suggests that “Google Video is a far better solution for audio and video distribution than any P2P solution” and that cable companies “charge for upstream bandwidth usage.”

Guess what–I already get charged for all the bandwidth I use, either up or down. When Verizon strings a fiberoptic cable to my home, I’m getting a certain amount of fixed capacity into the greater internet at large. If I want to trade a little upstream capacity for greater downstream capacity, that’s my call! Have you ever noticed that downloading over http is typically slow because there are 100s of clients and 1 host? If I download the same information over bittorrent, I can sustain 12Mbs because everyone is a server–including me. Distributed protocols, such as the ones powering Amazon Dynamo or bittorrent, are more efficient, cost effective, and fault tolerant than single-server models.

Reactions around the blogosphere indicate that Mark Cuban’s thoughts on P2P are nonsensical rubbish. Mashable calls him “a guy who does not understand how P2P works, and yet he wants it shut down.” Ars Technica notes that “if users who are currently saturating their connections with BitTorrent start saturating their connections with Google Video content, the end result is more or less the same.” And a slashdotter comments, “Just imagine how fast the internet would be if there were no content to view. After P2Ps gone, get rid of all these freeloading websites, emails, etc. and it will be blisteringly fast.”

My guess is that billionaire Mark Cuban has a slow, shared cable internet connection at home, the modern equivalent of a party line. This might lead him to confuse his own slow internet connection with a greater systemic problem. What he should be complaining about is why Verizon hasn’t strung fiber in his area yet.

Next Page »