Why Benchmarks
Figure out how well a (database) system performs.
Results specific to hardware configuration and specific parameters tuned.
User cares about how well "my application" will do -- which system should I buy?
Cannot test with actual application
* too complex, too costly
Use standard benchmark numbers as guide.

Required Characteristics
Must be simple to specify and implement.
Must be representative of some class of real applications.
Must be carefully specified
* cheating should be hard.
Companies consider bragging rights very important, and will often tune their
product to suit the benchmark, even at the cost of other applications at times.

Three Benchmarks
TP
WISS
XMark


Transaction Processing
Most famous benchmark in DB.
Proposed by Jim Gray and friends, but published as by "anon et al" so as to
     be able to dodge some bullets. 
Since codified by the Transaction Processing Council as TPCA.
TPCA and TPCB are no longer interesting benchmarks since you can get multiple 
     xacts per dollar.

TPC Benchmarks
TPC-C deals with a relatively complex operational database.
TPC-D deals with a data warehouse
	      evolved to TPC-R (business reporting) and TPC-H (ad hoc queries).
TPC-W deal with a web backend database.
Visit http://www.tpc.org
Test data generating code and benchmark spec. available for free download.
All benchmarks have a "scale factor" as a central feature.

TP1 benchmark
Supposedly a stylized statement of a cash withdrawal (or check cashing) transaction.
Complaints from banks that it is not quite how they do it in practice.
Not a water-tight spec.  TPC fixed much.
But used very widely, and central to RDBMS development in the early days.

How to Measure Things
Time: Wall-clock time on an unloaded system.
Cost: Compute 5-year cost of ownership, use straight-line depreciation, and
zero interest rate to determine cost per second of execution time on the system
Commercial benchmarks typically do not count cost per second, but rather
total system cost.
Cost of outages, of software development, ...

The Benchmarks
Sort one million 100 byte records stored on disk, using first ten bytes as key.
Scan one million 100 byte records in 1000 equal transactions, each of which
reads and writes back 1000 records, with locking.
The DebitCredit Benchmark

Transactions Per Second
TPS = Number of debitcredit transactions that can be run on the system per
second, with a response time less than 1 sec for 95% of the xacts. 
Do not worry about system start-up, crash recovery etc.  (though all are
required). 
Rewards simplicity -- don't pay a performance penalty for fancy functions.

100 TPS set up:
10,000 tellers (100 sec.) think time each.
1000 branches
10,000,000 accounts
One 100 byte record for each of the above in 3 separate tables, randomly accessed.
One 50 byte history record per transaction, 10 GB sequential file.

The transaction
read message from terminal
read-modify-write account
write history
read-modify-write teller
read-modify-write branch
write message to terminal
Pick branch, and teller in branch randomly.
Pick random account in branch 85% of time

Wisconsin Benchmark
An Engineer's benchmark, as opposed to a User's benchmark.
Does not model any application -- rather it is a stylized synthetic
database with queries designed to test specific features of this database.
Minimize randomness by deriving attributes in a stylized way.
Actually predates the TP1 benchmark (1983 vs 1985)!!

XML Benchmarks
What is the application?  What is the data set?
Some suggestions, notably XMark. http://www.xml-benchmark.org/
Customizable benchmarks -- Toxgene.  http://www.cs.toronto.edu/tox/toxgene
used in XBench at Waterloo.

MBench is an engineer's benchmark for XML.  Designed after the Wisconsin 
benchmark.  But, of necessity, more complex.
http://www.eecs.umich.edu/db/mbench


Parallelism
Measure how much performance improves by using n processors.
Ideally you would like for it to grow by a factor of n.  In practice, usually
grows by something less than that.

Speedup = Sing Proc. time / Multi-proc. time
Compare this against number of processors used.

Scaleup = Elapsed Time measured as the problem size is scaled linearly with
the number of processors.

Sizeup = Elapsed Time measured as the problem size is scaled linearly (with
no change to the hardware configuration).