Keeping statistics
From Hitchwiki
So you want to keep track of your rides (see also Statistics) and, once you've managed to get safely back home, do something with the data you've collected? If yes, read on...
This preliminary and not-yet complete article is based on the notes I (Prino) have used since I started recording my rides on 16 June 1980 @ 7:47. The notes I use are pretty simple, you can obviously adapt them to your own requirements.
Contents |
A possible form to record data
The forms I use to record my data look like this and properly spaced you can put three columns with four rows on a sheet of A4 (210x297mm) paper. Fitting three columns on a US "Letter" size sheet (215.9x279.4mm) of paper might be possible if the column widths are reduced. I save my notes in 17-ring binders.For those who don't know Dutch, the fields on the note translate into:
| Datum | Date |
| Tijd | Time |
| KM-stand | Odometer |
| Plaats | Place |
| Vertrek | Departure |
| Aankomst | Arrival |
| Snelheid | Speed |
| Opmerkingen | Notes |
The key to the detail of the produced statistics is the Notes section. I've simply called it Notes as you are unlikely to add specific notes to all of your rides...
The fields
What follows is a description of the various fields in the above format. There is one field that might seem to be missing, 'Waiting Time', but it should be obvious that it's automagically included as the difference between the arrival time of one ride and the departure time of the next ride, and in those cases where something happens in between, the Notes section comes to the rescue.
Date
Pretty obvious.
If you use the ISO8601 format (YYYY-MM-DD), it's easy to extend this to rides spanning multiple days by modifying the format into YYYY-MM-DD/DD.
Departure/Arrival
Again pretty obvious.
The three columns contain the time, odometer and place of departure and arrival. In cases where no odometer is available, or where it doesn't work, you can use Google maps to determine distances, I've found that it is usually accurate to the nearest kilometer
The unnamed row below 'Departure' contains the total time and distance of the ride.
Speed
The contents of this field depends on individual preferences. I put the real speed in it, i.e. the distance divided by the actual driving time, which is the arrival time minus the departure time minus any time recorded for stops.
Notes
This is a free-format field that you can use for any purpose you like. Why free format? Easy, it's unlikely that you would need a specific format for all of your rides, e.g. why include specific headings for rides that cross borders or span more than one day, when the number of such rides is likely to be pretty small compared to 'normal' rides.
Here's a non-exhaustive list of things I've been using this field for:
- The type of the driver. (Male, female, truck, van, taxi, etc, you can make it as detailed as you like)
- The nationality of the driver, I use the international license plate code, but only record this information if the driver is not from the country where they pick me up.
- Details of stops (reason: meal, rest, toilet, etc) and times. Recording of these times allows you to calculate the true driving speed!
- Details (i.e. time and odometer reading) at border crossings, which allows you to create per-country totals.
- Odometer readings at midnight, which allows you to create per-calendar-day totals.
- Special vehicles (or even every type/model of vehicle/car).
- On the last ride of a day I add a 'Total distance/time/speed' entry.
There are probably lots of other things you might want to put into it.
If the Notes section is too small (and I've had rides with a dozen stops and going across five borders in two days), I usually continue on the back of the form - see the two samples somewhere on this page.
Storing the data on a PC
This is likely to be the most important decision you will have to make. There are (at least) three options:
- a (structured) text file, to be processed by a user-written program
- a spreadsheet
- a database
Each of these options has its pros and cons, here are some details:
Text file
I, (Prino), use the text file option with a few programs I've written myself. The advantage of using this format is the fact that it allows me total flexibility, but it has a pretty big disadvantage in that you have to think very carefully about the format you plan to use: it should be able to cater for future changes without you having to completely rewrite your programs. My format, described later, was developed over about 23 years and despite that fact that I've moved to a new format after a few years, the result of not giving the format enough thought initially, is rather cryptic due to more additions since adopting it!
A spreadsheet
If you're well versed in spreadsheets (or if not, try LibreOffice, it's free) you might want to consider using one to process your data. It will have the big advantage that you can insert or delete columns in your source data and the program will automagically update the references in all other cells and/or sheets. Combined with the many conditional functions, you should (probably) be able to produce any statistics you like, although some of the more esoteric ones my program creates will be pretty hard (or even impossible) to replicate.
A database
What was written about spreadsheets also holds true for databases. Not having used any PC database programs, I cannot recommend any, but there are plenty of free ones, LibreOffice and MySQL, to name just two of the more well known ones. Creating your statistics will mean writing queries (most likely in the fairly easy to learn language SQL), but given the non-procedural nature of this language, some results that can be created with a self-written program or a spreadsheet may be hard or impossible to recreate.
Prino's original program
As mentioned above, and being a programmer by profession, I selected the first method of storing the data, a text file. The first 60(!) versions of the program were written in Turbo Pascal V3.01a and until about version 20 they used 'version 1' of a simple C(omma) S(eparated) V(alue) file with the data. They could handle rides passing through multiple countries and spanning more than one day, but did not know anything about ferry crossings, stops or timezones, to name but a few of the things that arrived later...
Given that the old format became obsolete a long time ago, I've not included any details about it, but its output mimic'ed my manually created five tables per trip, containing:
- a table with the distribution of the distances per ride. Initially in intervals of 100 km, but due to the overwhelming number of rides shorter than 100 km, this interval was soon split up into four additional intervals of 25 km, and the, as expected rather small number of, rides over 1,000 km long are split into intervals of 1,000 km.
- a table of distances for each type of driver.
- a table of distances per country.
- a table with a count of the number of drivers per nationality.
- a table with various maxima, minima and averages for the rides and days of the trip, i.e. highest/lowest/average speed per ride/day, greatest/smallest/average distance per ride/day, longest/shortest/average time per ride/day.
Simple, uncomplicated and one might assume that most hitch-hikers would leave it at this...
Prino's current program
The current program is written in Virtual Pascal V2.1 build 279 (or PL/I, should you want to run it on IBM's z/OS). It is licensed under the provisions of the GPL V3. The authenticity verified WinRAR archive containing the source and executable files can be found in the "Files" section of the "hitching" group on Yahoo! Note that you have to be a member of the group to access the Files section!
Data format used by Prino's current program
The 'simple' format was used until the end of 1994. Due to the fact that I wanted to add some additional statistics to the output files, it was changed into something a bit more logical, although some people might find otherwise. (And they are right, it's a right-royal mess due to more additional requirements, and I would like to simplify some of the more esoteric uses of punctuation, but that won't happen until I get back onto the 'big iron' with its superb debugging facilities!)
The current format, split into two parts to avoid scrolling, looks like this:
....v....1....v....2....v....3....v....4....v....5....v....6....v....7....v....8....v....9.... 999, 9999, AAA, 99999.9, HHH.MM, 999.9, NAT, TYPE, CTY, HH.MM, S, HH.MM, HH.MM, YYYY-MM-DD, | | | | | | | | | | | | | | a b c d e f g h i j k l m n v....0....v....1....v....2....v....3....v....4....v....5....v....6....v.... 999999.9, DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD , AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA | | | o p q
and the fields used in it are:
| a | number of the trip, up to three digits | ||||||||
| b | number of the ride with the trip, up to four digits | ||||||||
| c | 'day'
|
||||||||
| d | total distance for this ride, up to 99999.9 km | ||||||||
| e | total driving time for this ride | ||||||||
| f |
|
||||||||
| g | nationality of the driver, three characters | ||||||||
| h |
|
||||||||
| i | the country for this ride, either
|
||||||||
| j | wait for this ride, blank for first ride of the day (unless you decide otherwise...) Use hh:mm instead of hh.mm for waits caused by departure from a ferry terminal where you haven't been able to get a ride on the ferry (and may have to wait until the next ferry...) |
||||||||
| k | split type indicator, possible values are:
If there are multiple split indicators present for a ride, they must be in the order blank, '#', '*', '!'. |
||||||||
| l |
Use hh:mm instead of hh.mm for waits caused by departures after a ferry crossing. |
||||||||
| m |
Notes:
|
||||||||
| n | date for the ride, in ISO8601 format. Required for the first ride of the day, and for every k = '#' record for multiple day rides, except the first, unless the multiple day ride is the first ride of the day. |
||||||||
| o | odometer at departure for this ride | ||||||||
| p | place of departure, I (currently) limit it to 31 characters, the length of 'Aire de Verdun St.Nicholas Sud' | ||||||||
| q | place of arrival, same limit | ||||||||
Notes:
|
Meta data
As I didn't want to make the format of the data even more complex than it already is, I decided to allow for comments and meta-data to be embedded into the input file. Both comments and meta-data start with a '{' and can be up to 255 characters long.
The format of the comments/meta-data is:
| {Z?CTY sNN CTY sNN CTY sNN ... ....v....1....v....2....v....3 |
These lines contain timezone information. The format of these comments is:
Pos 4..11 can be repeated up to 31 times. Should a trip pass through more than 31 countries or should you wish to include all countries in one single place, additional '{Z+...' lines must be used. |
||||||||||||||
| {< aaaa bbbb cccc - lorum ipsum ....v....1....v....2....v....3... {> - lorum ipsum |
These lines allow the rides of a second person, provided they are an exact subset of the rides of the first person, to be extracted into their own file. The essential parts of the lines are:
|
Note: Any line starting with '{' that does not fit into any of the above categories is ignored completely, i.e. treated as a comment!
The results of Prino's current program
The current program produces rather a lot more output than the five tables per trip! In fact it now produces four files and an optional additional post-processor program that translates the output into .RTF format creates two additional files with two tables sorted in various other orders.
The summary output file: 'summ.h-h'
This file contains no less than 86 tables (some of them broken into several parts because they would otherwise require A3 or A2 size paper). Here's the full list, the examples given are based on the first two trips of my hitch-hiking career:
- two tables of general totals for every trip
- per individual trip
- as a running cumulative total
- a table of totals for all distances
- a table of totals for all types
- a table of totals for all countries
- a table of totals for all nationalities
- a table of totals for all speeds
- a table of totals for all waits
- a table of the statistical waiting time distribution
- a table of all in-ride waits per category
- two tables of waits relating to ferries
- waits after ferry crossings
- time spent on ferries
- three tables of pick-ups
- per nationality per country
- per country per type
- per nationality per type
- a table with the distribution of departure times
- a table with the first and last ride for all distances
- a table with the first and last ride for all types
- a table with the first and last ride for all countries
- a table with the first and last ride for all nationalities
- a table with the first and last ride for all speeds
- two tables of waits per trip, split in short and long waits
- per individual trip
- as a running cumulative total
- a table of waits per country, split in short and long waits
- a table of waits per weekday, split in short and long waits
- a table of waits per month, split in short and long waits
- a table of waits per year, split in short and long waits
- a max/min/average summary for all rides
- a max/min/average summary for all days
- a max/min/average summary for all types
- a max/min/average summary for all nationalities
- a max/min/average summary for all countries
- a table of rides per country, split in internal and border crossing rides
- four tables for the max/min speed & max/min rides for a given number of distances
- four tables for the max/min speed & max/min distance for a given number of rides
- four tables for the maximum number of rides exceeding a number of selected velocities, maximized for the number of rides and the distance,
- one set of two tables for absolute speed
- one set of two tables for average speed
- four tables for the maximum number of rides exceeding a number of selected lengths, maximized for the number of rides and the distance,
- one set of two tables for absolute distance
- one set of two tables for average distance
- a max/min/average summary for all rides per year
- a max/min/average summary for all days per year
- a table of totals for all distances per day
- a table of totals for all speeds per day
- a table with the first and last day for all distances
- a table with the first and last day for all speeds
- four tables for the max/min speed & max/min days for a given number of distances
- four tables for the max/min speed & max/min distance for a given number of days
- four tables for the maximum number of days exceeding a number of selected velocities, maximized for the number of days and the distance,
- one set of two tables for absolute speed
- one set of two tables for average speed
- four tables for the maximum number of days exceeding a number of selected lengths, maximized for the number of days and the distance,
- one set of two tables for absolute distance
- one set of two tables for average distance
- a table with totals per weekday
- a table with totals per month
- two tables of general totals per year
- a copy of table 1) for the years
- a list of trips and rides per year
- a table with usage of days per year
- a table of totals for consecutive days
- a table of totals for 24 hour periods
- a table of totals for 365 day periods
- a table of minimum number of rides needed for selected numbers of nationalities
- two tables (one per trip, one per year) with the number of types, countries and nationalities encountered during the trip/year, split in a total and a "new" column
- four tables (two per type, two per nationality) with
- the longest run of consecutive rides for a single type or nationality
- the longest run of consecutive rides without a type or nationality
The trip/type/country/nationality/year output file: 'lift.h-h'
This file contains
- four pages for every trip, containing the following tables:
- on page 1:
- a table with totals per day
- a table of totals for all distances
- a table of totals for all types
- a table of totals for all countries
- a table of totals for all nationalities
- a table of totals for all speeds
- a max/min/average summary for all rides and days
- on page 2:
- a table of totals for all waits
- a table of the statistical waiting time distribution
- a table of all in-ride waits per category
- a table of waits per country, split in short and long waits
- on page 3:
- three tables of pick-ups
- per nationality per country
- per country per type
- per nationality per type
- three tables of pick-ups
- on page 4:
- a max/min/average summary for all types
- a max/min/average summary for all nationalities
- a max/min/average summary for all countries
- two tables detailing distances per country
- a table listing the (partial) country distances in the order they were passed
- a table that just summarizes the distance per country
- on page 1:
- a 'Totals per type' separator page, followed by one page for every type, containing the following five tables:
- a table of totals for all distances
- a table of totals for all countries
- a table of totals for all nationalities
- a table of totals for all speeds
- a max/min/average summary for the type
The table with totals per type is not included on the per-type pages, as it would contain just a single line with the totals for that particular type. Instead the type is added into the heading of the totals-for-all-distances table.
- a 'Totals per country' separator page, followed by one page for every country, containing the following four tables:
- a table of totals for all waits, the country is added to the heading of this table
- a table of the statistical waiting time distribution
- a table with the distribution of departure times
- a max/min/average summary for the country, containing two rows, one for the non border-crossing rides, and one for the border-crossing rides
- a 'Totals per nationality' separator page, followed by one page for every nationality, containing the following tables:
- a table of totals for all distances
- a table of totals for all types
- a table of totals for all countries
- a table of totals for all speeds
- a max/min/average summary for the nationality
The table with totals per nationality is not included on the per-nationality pages, as it would contain just a single line with the totals for that particular nationality. Instead the nationality is added into the heading of the totals-for-all-distances table.
- a 'Totals per year' separator page, followed by five pages for every year, containing the following tables:
- on page 1:
- a table of totals for all distances
- a table of totals for all types
- a table of totals for all countries
- a table of totals for all nationalities
- a table of totals for all speeds
- a max/min/average summary for all rides and days
The year is added into the heading of the totals-for-all-distances table.
- on page 2:
- a table of totals for all waits
- a table of the statistical waiting time distribution
- a table of all in-ride waits per category
- a table of waits per country, split in short and long waits
- on page 3:
- three tables of pick-ups
- per nationality per country
- per country per type
- per nationality per type
- three tables of pick-ups
- on page 4:
- a max/min/average summary for all types
- a max/min/average summary for all nationalities
- a max/min/average summary for all countries
- a table that summarizes the distance per country, split in non border-crossing and border-crossing rides
- on page 5:
- a table of totals for all distances per day
- a table of totals for all speeds per day
- a table with totals per weekday
- a table with totals per month
- a table with progressive totals for 24 hour periods
- a table with the total period in days hitched during the year
- on page 1:
However, some logical pages may overflow physical pages, which is most likely to happen with the page that contains your most seen type, especially if you've visited a fair amount of countries.
The set of programs contains an optional program to remove all data that does not relate to the current trip from this file, leaving only
- five pages for the current trip
- a 'Totals per type' separator page, followed by one page for every type that appeared in the current trip,
- a 'Totals per country' separator page, followed by one page for every country that appeared in the current trip,
- a 'Totals per nationality' separator page, followed by one page for every nationality that appeared in the current trip, and
- a 'Totals per year' separator page, followed by five pages per year for the year(s) of the current trip,
which is kinder to trees, if you insist on also keeping the results on paper.
The daily summary output file: 'days.h-h'
This file contains one table with a line for every calendar day of every trip, detailing
- the number of the trip
- the day in the trip
- the distance hitched during the day
- the (actual) driving time during the day
- the average velocity for the day
- the date
A follow-up program will process this file, putting the original single column data in four columns of 70 rows. It also sorts the file into three additional orders, Distance, Time and Velocity. If the data is required to be in .RTF format, this program is required.
The formatted input data output file: 'trip.h-h'
This file merely puts the input data into a neat table (zapping the odometer and place of departure & arrival columns). The program will paginate trips that do not fit on A4 paper.