== overall notes ==

Stats generation would be separated into two steps:

1) Adding data from new logs to the DB;
2) Generating stats from existing DB.

The DB would be able to hold log data for ALL rules and symmetries. We would
not, however, necessarily add data from all rules to the DB. (Day & Night in
particular would cause it to balloon.) The DB would be SQLite for now; if it
gets too large it could be moved to a proper DB server like PostgreSQL with
minimal effort.

Stats could be generated for either a given rule (all symmetries combined),
or a given rule AND symmetry.  (There is no reason why allowing all
rules/symmetries combined would make sense, though it would be possible.)

Object names etc. would live in the DB; files like objectnames.txt would be
used to feed the DB. There'd be a separate script to do this. Edits would
still be done in these files, not the DB.

GENERALLY, THE DB WILL NOT HOLD ANY INFORMATION THAT CANNOT BE REGENERATED
FROM EXTERNAL FILES.  That means logs for census data, and objectnames.txt
etc.  for the objectdata table.

== Tables ==

NOTE: if a column is marked "?" (e.g. "?type"), I'm not sure it should be
included.

Table hauls: a list of hauls contained in the database. Used to check what
hauls are new and therefore need to be committed.

	seed (primary key)
	filename
	timestamp
	rule
	symmetry
        objects
        soups
	...

Table haulcensus: census data for each haul. Is this necessary? It's
basically a copy of the log in structured form; do we need this in the DB?

	seed
	object (apgcode)
	?prefix
	?type
	?p
	count

Table census: main census data.  Has a row for each object/rule/symmetry
combination.  Should be joined with objectdata if objects' names etc.  are
desired.

	object (apgcode)
	count
	rule
	symmetry
	firstseen (timestamp)
	lastseen (timestamp)

Table censusYYYYMMDD: census data for date YYYY-MM-DD. Like table census.
	
Table objectdata: auxilliary object data.  Contains object names, types
("xp", "xq" etc.), prefixes ("xp2", "xq4" etc.), p's (population/period);
might also contain such things as heat, volatility etc.

	apgcode (primary key)
	name
	prefix
	type
	p
	...

== Indexes ===

TBD

== Adding hauls ==

1. Get list of hauls from the database.

    SELECT filename FROM HAULS ORDER BY filename;

2. Get list of logs.
3. Figure out which logs are new and need to be committed.
4. Start a DB transaction.
5. For each new log...
5a. Read log data.
5b. Add entries to haulcensus table (assuming this table is indeed included
    in the DB's design).
6. Aggregate log data.
7. For each object/rule/symmetry:
7a. Find out if object is already in table census.
7a1. If so, update its count and lastseen.
7a2. If not:
7a2a. Insert it.
7a2b. Find out if object is already in table objectdata.
7a2b1. If not: determine basic characteristics (prefix, type, p) and add
       it.
8. Aggregate log data by date.
9. For each day:
9a. Create if not exists table censusYYYYMMDD.
9b. For each object/rule/symmetry for that day: proceed as above. Skip
    updating table objectdata, though, all new objects are already
    guaranteed to have entries by this point.
10. Commit DB.

== Creating reports ==

1. All DB queries should be qualified using WHERE rule = ... AND symmetry
   = ... if requested by user.
2. Select overall data from table hauls.

    SELECT count(*) AS hauls, sum(objects) AS objects, sum(soups) AS soups FROM hauls;

3. Select distinct objects from table census.

    SELECT count(distinct(object)) AS distinctobjects FROM census;

4. Use this to compute average soups/haul and objects/soup.
5. For main census, basically select from table census, joined with table
   objectdata for names.
6. For prefix reports ("xq4" etc.), select from table census joined with
   table objectdata, WHERE prefix = ... .
7. For diary: 
7a. Use table hauls and column timestamp to get number of hauls etc. for the day.
7b. Use table censusYYYYMMDD to get interesting objects for the day.
7c. Use table census and column firstseen to get new objects for the day.