== overall notes == Stats generation would be separated into two steps: 1) Adding data from new logs to the DB; 2) Generating stats from existing DB. The DB would be able to hold log data for ALL rules and symmetries. We would not, however, necessarily add data from all rules to the DB. (Day & Night in particular would cause it to balloon.) The DB would be SQLite for now; if it gets too large it could be moved to a proper DB server like PostgreSQL with minimal effort. Stats could be generated for either a given rule (all symmetries combined), or a given rule AND symmetry. (There is no reason why allowing all rules/symmetries combined would make sense, though it would be possible.) Object names etc. would live in the DB; files like objectnames.txt would be used to feed the DB. There'd be a separate script to do this. Edits would still be done in these files, not the DB. GENERALLY, THE DB WILL NOT HOLD ANY INFORMATION THAT CANNOT BE REGENERATED FROM EXTERNAL FILES. That means logs for census data, and objectnames.txt etc. for the objectdata table. == Tables == NOTE: if a column is marked "?" (e.g. "?type"), I'm not sure it should be included. Table hauls: a list of hauls contained in the database. Used to check what hauls are new and therefore need to be committed. seed (primary key) filename timestamp rule symmetry objects soups ... Table haulcensus: census data for each haul. Is this necessary? It's basically a copy of the log in structured form; do we need this in the DB? seed object (apgcode) ?prefix ?type ?p count Table census: main census data. Has a row for each object/rule/symmetry combination. Should be joined with objectdata if objects' names etc. are desired. object (apgcode) count rule symmetry firstseen (timestamp) lastseen (timestamp) Table censusYYYYMMDD: census data for date YYYY-MM-DD. Like table census. Table objectdata: auxilliary object data. Contains object names, types ("xp", "xq" etc.), prefixes ("xp2", "xq4" etc.), p's (population/period); might also contain such things as heat, volatility etc. apgcode (primary key) name prefix type p ... == Indexes === TBD == Adding hauls == 1. Get list of hauls from the database. SELECT filename FROM HAULS ORDER BY filename; 2. Get list of logs. 3. Figure out which logs are new and need to be committed. 4. Start a DB transaction. 5. For each new log... 5a. Read log data. 5b. Add entries to haulcensus table (assuming this table is indeed included in the DB's design). 6. Aggregate log data. 7. For each object/rule/symmetry: 7a. Find out if object is already in table census. 7a1. If so, update its count and lastseen. 7a2. If not: 7a2a. Insert it. 7a2b. Find out if object is already in table objectdata. 7a2b1. If not: determine basic characteristics (prefix, type, p) and add it. 8. Aggregate log data by date. 9. For each day: 9a. Create if not exists table censusYYYYMMDD. 9b. For each object/rule/symmetry for that day: proceed as above. Skip updating table objectdata, though, all new objects are already guaranteed to have entries by this point. 10. Commit DB. == Creating reports == 1. All DB queries should be qualified using WHERE rule = ... AND symmetry = ... if requested by user. 2. Select overall data from table hauls. SELECT count(*) AS hauls, sum(objects) AS objects, sum(soups) AS soups FROM hauls; 3. Select distinct objects from table census. SELECT count(distinct(object)) AS distinctobjects FROM census; 4. Use this to compute average soups/haul and objects/soup. 5. For main census, basically select from table census, joined with table objectdata for names. 6. For prefix reports ("xq4" etc.), select from table census joined with table objectdata, WHERE prefix = ... . 7. For diary: 7a. Use table hauls and column timestamp to get number of hauls etc. for the day. 7b. Use table censusYYYYMMDD to get interesting objects for the day. 7c. Use table census and column firstseen to get new objects for the day.