Keeping statistics

So you want to keep track of your rides (see also Statistics) and, once you've managed to get safely back home, do something with the data you've collected? If yes, read on...

This preliminary and not-yet complete article is based on the notes I (Prino) have used since I started recording my rides on 16 June 1980 @ 7:47. The notes I use are pretty simple, you can obviously adapt them to your own requirements.

A possible form to record data
The forms I use to record my data look like this and properly spaced you can put three columns with four rows on a sheet of A4 (210x297mm) paper. Fitting three columns on a US "Letter" size sheet (215.9x279.4mm) of paper might be possible if the column widths are reduced. I save my notes in 17-ring binders.

For those who don't know Dutch, the fields on the note translate into:

The key to the detail of the produced statistics is the Notes section. I've simply called it Notes as you are unlikely to add specific notes to all of your rides...

The fields
What follows is a description of the various fields in the above format. There is one field that might seem to be missing, 'Waiting Time', but it should be obvious that it's automagically included as the difference between the arrival time of one ride and the departure time of the next ride, and in those cases where something happens in between, the Notes section comes to the rescue.

Date
Pretty obvious.

If you use the ISO8601 format (YYYY-MM-DD), it's easy to extend this to rides spanning multiple days by modifying the format into YYYY-MM-DD/DD.

Departure/Arrival
Again pretty obvious.

The three columns contain the time, odometer and place of departure and arrival. In cases where no odometer is available, or where it doesn't work, you can use Google maps to determine distances, I've found that it is usually accurate to the nearest kilometer

The unnamed row below 'Departure' contains the total time and distance of the ride.

Speed
The contents of this field depends on individual preferences. I put the real speed in it, i.e. the distance divided by the actual driving time, which is the arrival time minus the departure time minus any time recorded for stops.

Storing the data on a PC
This is likely to be the most important decision you will have to make. There are (at least) three options:

 a (structured) text file, to be processed by a user-written program a spreadsheet a database 

Each of these options has its pros and cons, here are some details:

Text file
I, (Prino), use the text file option with a few programs I've written myself. The advantage of using this format is the fact that it allows me total flexibility, but it has a pretty big disadvantage in that you have to think very carefully about the format you plan to use: it should be able to cater for future changes without you having to completely rewrite your programs. My format, described later, was developed over about 23 years and despite that fact that I've moved to a new format after a few years, the result of not giving the format enough thought initially, is rather cryptic due to more additions since adopting it!

A spreadsheet
If you're well versed in spreadsheets (or if not, try LibreOffice, it's free) you might want to consider using one to process your data. It will have the big advantage that you can insert or delete columns in your source data and the program will automagically update the references in all other cells and/or sheets. Combined with the many conditional functions, you should (probably) be able to produce any statistics you like, although some of the more esoteric ones my program creates will be pretty hard (or even impossible) to replicate.

A database
What was written about spreadsheets also holds true for databases. Not having used any PC database programs, I cannot recommend any, but there are plenty of free ones, LibreOffice and MySQL, to name just two of the more well known ones. Creating your statistics will mean writing queries (most likely in the fairly easy to learn language SQL), but given the non-procedural nature of this language, some results that can be created with a self-written program or a spreadsheet may be hard or impossible to recreate.

Prino's original program
As mentioned above, and being a programmer by profession, I selected the first method of storing the data, a text file. The first 60(!) versions of the program were written in Turbo Pascal V3.01a and until about version 20 they used 'version 1' of a simple C(omma) S(eparated) V(alue) file with the data. They could handle rides passing through multiple countries and spanning more than one day, but did not know anything about ferry crossings, stops or timezones, to name but a few of the things that arrived later...

Given that the old format became obsolete a long time ago, I've not included any details about it, but its output mimic'ed my manually created five tables per trip, containing:

 a table with the distribution of the distances per ride. Initially in intervals of 100 km, but due to the overwhelming number of rides shorter than 100 km, this interval was soon split up into four additional intervals of 25 km, and the, as expected rather small number of, rides over 1,000 km long are split into intervals of 1,000 km. a table of distances for each type of driver. a table of distances per country. a table with a count of the number of drivers per nationality.</li> a table with various maxima, minima and averages for the rides and days of the trip, i.e. highest/lowest/average speed per ride/day, greatest/smallest/average distance per ride/day, longest/shortest/average time per ride/day.</li> </ul>

Simple, uncomplicated and one might assume that most hitch-hikers would leave it at this...

Prino's current program
The current program is written in Virtual Pascal V2.1 build 279 (or PL/I, should you want to run it on IBM's z/OS). It is licensed under the provisions of the GPL V3. The authenticity verified WinRAR archive containing the source and executable files can be found in the "Files" section of the "hitching" group on Yahoo! Note that you have to be a member of the group to access the Files section!

Data format used by Prino's current program
The 'simple' format was used until the end of 1994. Due to the fact that I wanted to add some additional statistics to the output files, it was changed into something a bit more logical, although some people might find otherwise. (And they are right, it's a right-royal mess due to more additional requirements, and I would like to simplify some of the more esoteric uses of punctuation, but that won't happen until I get back onto the 'big iron' with its superb debugging facilities!)

The current format looks like this:

....v....1....v....2....v....3....v....4....v....5....v....6....v....7....v....8....v....9....v....0....v....1....v....2....v....3....v....4....v....5....v....6....v.... 999, 9999, AAA, 99999.9, HHH.MM,  999.9, NAT, TYPE, CTY, HH.MM, S, HH.MM, HH.MM, YYYY-MM-DD, 999999.9, DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA |   |      |     |         |        |    |    |     |     |    |    |      |        |          |                     |                                 | a    b      c     d         e        f    g    h     i     j    k    l      m        n          o                     p                                 q and the fields used in it are:

</li> </ul> <td align="center" valign="top">i the country for this ride, either <ul style="list-style-type:disc"> the three letter country abbreviation (when k is blank)</li> an '*' if this ride is split over several countries. The '*' must be on the first line of this ride</li> a blank, for those lines of the ride that deal with the various days (when k = '#')</li> three letter country abbreviation of the country being left (when h = '!B' and k = '!')</li> </ul> <td align="center" valign="top">j wait for this ride, blank for first ride of the day (unless you decide otherwise...)

Use hh : mm instead of hh. mm for waits caused by departure from a ferry terminal where you haven't been able to get a ride on the ferry (and may have to wait until the next ferry...) <td align="center" valign="top">k split type indicator, possible values are: If there are multiple split indicators present for a ride, they must be in the order blank, '#', '*', '!'. <td align="center" valign="top">l <ul> departure time for this ride, or </li> starting time of a k = '!' in-ride wait</li> time for a non-stopping border-crossing ride</li> </ul> Use hh : mm instead of hh. mm for waits caused by departures after a ferry crossing. <td align="center" valign="top">m <ul> arrival time for this ride, or </li> ending time of a k = '!' in-ride wait</li> </ul> Notes: <ul> the arrival time for a multiple day ride must be on the last k = '#' record for that ride</li> non-stopping border-crossing rides must not have an arrival time, unless the timezone changes</li> </ul> <td align="center" valign="top">n date for the ride, in ISO8601 format. Required for the first ride of the day, and for every k = '#' record for multiple day rides, except the first, unless the multiple day ride is the first ride of the day. <td align="center" valign="top">o odometer at departure for this ride <td align="center" valign="top">p place of departure, I (currently) limit it to 31 characters, the length of 'Aire de Verdun St.Nicholas Sud' <td align="center" valign="top">q place of arrival, same limit Notes: <ul> "blank" lines (consisting only of just a CR/LF) are allowed</li> lines starting with '{' in column 1 contain comments or meta-data required by LIFT or CHKDAT, see below for a description of the meta-data format</li> positions are FIXED</li> columns containing numbers are right-aligned</li> <li>columns containing text are left-aligned</li> <li>columns o, p and q are currently completely ignored by the main 'LIFT' program</li> </ul>

Meta data
As I didn't want to make the format of the data even more complex than it already is, I decided to allow for comments and meta-data to be embedded into the input file. Both comments and meta-data start with a '{' and can be up to 255 characters long.

The format of the comments/meta-data is:

Pos 4..11 can be repeated up to 31 times. Should a trip pass through more than 31 countries or should you wish to include all countries in one single place, additional '{Z+...' lines must be used.

The program can handle up to 256 countries, which is more than the current number of countries on Earth, but it still requires a change to handle fractional timezones, for countries like Iran (UTC +3.30), India (UTC +4.30) {&lt; aaaa bbbb cccc - lorum ipsum ....v....1....v....2....v....3... {&gt; - lorum ipsum These lines allow the rides of a second person, provided they are an exact subset of the rides of the first person, to be extracted into their own file. The essential parts of the lines are: Note: Any line starting with '{' that does not fit into any of the above categories is ignored completely, i.e. treated as a comment!

The results of Prino's current program
The current program produces rather a lot more output than the five tables per trip! In fact it now produces four files and an optional additional post-processor program that translates the output into .RTF format creates two additional files with two tables sorted in various other orders.

The summary output file: 'summ.h-h'
This file contains no less than 86 tables (some of them broken into several parts because they would otherwise require A3 or A2 size paper). Here's the full list, the examples given are based on the first two trips of my hitch-hiking career:

<ol> <li>two tables of general totals for every trip <ul> <li>per individual trip</li> <li>as a running cumulative total</li> </ul> </li> <li>a table of totals for all distances</li> <li>a table of totals for all types</li> <li>a table of totals for all countries</li> <li>a table of totals for all nationalities</li> <li>a table of totals for all speeds</li> <li>three tables of totals for all waits <ul> <li>waits split up in waiting time intervals</li> <li>a statistical analysis of waiting times</li> <li>waits split up in reason per wait</li> </ul> </li> <li>two tables of ferry related waits <ul> <li>waits after ferry crossings</li> <li>time spent on ferries</li> </ul> </li> <li>three tables of pick-ups <ul> <li>per nationality per country</li> <li>per country per type</li> <li>per nationality per type</li> </ul> </li> <li>a table with the distribution of departure times per weekday</li> <li>a table with the first and last ride for all distances</li> <li>a table with the first and last ride for all types</li> <li>a table with the first and last ride for all countries</li> <li>a table with the first and last ride for all nationalities</li> <li>a table with the first and last ride for all speeds</li> <li>two tables of waits per trip, split in short and long waits <ul> <li>per individual trip</li> <li>as a running cumulative total</li> </ul> </li> <li>a table of waits per country, split in short and long waits</li> <li>a table of waits per weekday, split in short and long waits</li> <li>a table of waits per month, split in short and long waits</li> <li>a table of waits per year, split in short and long waits</li> <li>a max/min/average summary for all rides</li> <li>a max/min/average summary for all days</li> <li>a max/min/average summary for all types</li> <li>a max/min/average summary for all nationalities</li> <li>a max/min/average summary for all countries</li> <li>a table of rides per country, split in internal and border crossing rides</li> <li>four tables for the max/min speed & max/min rides for a given number of distances</li> <li>four tables for the max/min speed & max/min distance for a given number of rides</li> <li>four tables for the maximum number of rides exceeding a number of selected velocities, maximized for the number of rides and the distance, <ul> <li>one set of two tables for absolute speed</li> <li>one set of two tables for average speed</li> </ul> </li> <li>four tables for the maximum number of rides exceeding a number of selected lengths, maximized for the number of rides and the distance, <ul> <li>one set of two tables for absolute distance</li> <li>one set of two tables for average distance</li> </ul> </li> <li>a max/min/average summary for all rides per year</li> <li>a max/min/average summary for all days per year</li> <li>a table of totals for all distances per trip</li> <li>a table of totals for all speeds per trip</li> <li>a table of totals for all distances per day</li> <li>a table of totals for all speeds per day</li> <li>a table with the first and last day for all distances</li> <li>a table with the first and last day for all speeds</li> <li>four tables for the max/min speed & max/min days for a given number of distances</li> <li>four tables for the max/min speed & max/min distance for a given number of days</li> <li>four tables for the maximum number of days exceeding a number of selected velocities, maximized for the number of days and the distance, <ul> <li>one set of two tables for absolute speed</li> <li>one set of two tables for average speed</li> </ul> </li> <li>four tables for the maximum number of days exceeding a number of selected lengths, maximized for the number of days and the distance, <ul> <li>one set of two tables for absolute distance</li> <li>one set of two tables for average distance</li> </ul> </li> <li>a table with totals per weekday</li> <li>a table with totals per month</li> <li>a table of general totals per year</li> <li>a table with first/last ride/trip per year</li> <li>a table with usage of days per year</li> <li>a table of totals for consecutive days</li> <li>a table of totals for 24 hour periods</li> <li>a table of totals for 365 day periods</li> <li>a table of minimum number of rides needed for selected numbers of nationalities</li> <li>two tables (one per trip, one per year) with the number of types, countries and nationalities encountered during the trip/year, split in a total and a "new" column</li> <li>four tables (two per type, two per nationality) with <ul> <li>the longest run of consecutive rides for a single type or nationality</li> <li>the longest run of consecutive rides without a type or nationality</li> </ul> </li> <li>a table of pickup times per 4-hour interval per country</li> </ol>

The trip/type/country/nationality/year output file: 'lift.h-h'
This file contains <ul> <li>four pages for every trip, containing the following tables: <ul> <li>on page 1: <ol> <li>a table with totals per day</li> <li>a table of totals for all distances</li> <li>a table of totals for all types</li> <li>a table of totals for all countries</li> <li>a table of totals for all nationalities</li> <li>a table of totals for all speeds</li> <li>a max/min/average summary for all rides and days</li> </ol> </li> <li>on page 2: <ol> <li>a table of totals for all waits</li> <li>a table of the statistical waiting time distribution</li> <li>a table of all in-ride waits per category</li> <li>a table of waits per country, split in short and long waits</li> </ol> </li> <li>on page 3: <ol> <li>three tables of pick-ups <ol> <li>per nationality per country</li> <li>per country per type</li> <li>per nationality per type</li> </ol> </li> </ol> </li> <li>on page 4: <ol> <li>a max/min/average summary for all types</li> <li>a max/min/average summary for all nationalities</li> <li>a max/min/average summary for all countries</li> <li>two tables detailing distances per country <ol> <li>a table listing the (partial) country distances in the order they were passed</li> <li>a table that just summarizes the distance per country</li> </ol> </li> </ol> </li> </ul> </li> <li>a 'Totals per type' separator page, followed by one page for every type, containing the following five tables: <ol> <li>a table of totals for all distances</li> <li>a table of totals for all countries</li> <li>a table of totals for all nationalities</li> <li>a table of totals for all speeds</li> <li>a max/min/average summary for the type</li> </ol> The table with totals per type is not included on the per-type pages, as it would contain just a single line with the totals for that particular type. Instead the type is added into the heading of the totals-for-all-distances table. </li> <li>a 'Totals per country' separator page, followed by one page for every country, containing the following four tables: <ol> <li>a table of totals for all waits, the country is added to the heading of this table</li> <li>a table of the statistical waiting time distribution</li> <li>a table with the distribution of departure times</li> <li>a max/min/average summary for the country, containing two rows, one for the non border-crossing rides, and one for the border-crossing rides</li> </ol> </li> <li>a 'Totals per nationality' separator page, followed by one page for every nationality, containing the following tables: <ol> <li>a table of totals for all distances</li> <li>a table of totals for all types</li> <li>a table of totals for all countries</li> <li>a table of totals for all speeds</li> <li>a max/min/average summary for the nationality</li> </ol> The table with totals per nationality is not included on the per-nationality pages, as it would contain just a single line with the totals for that particular nationality. Instead the nationality is added into the heading of the totals-for-all-distances table. </li> <li>a 'Totals per year' separator page, followed by five pages for every year, containing the following tables: <ul> <li>on page 1: <ol> <li>a table of totals for all distances</li> <li>a table of totals for all types</li> <li>a table of totals for all countries</li> <li>a table of totals for all nationalities</li> <li>a table of totals for all speeds</li> <li>a max/min/average summary for all rides and days</li> </ol> The year is added into the heading of the totals-for-all-distances table. </li> <li>on page 2: <ol> <li>a table of totals for all waits</li> <li>a table of the statistical waiting time distribution</li> <li>a table of all in-ride waits per category</li> <li>a table of waits per country, split in short and long waits</li> </ol> </li> <li>on page 3: <ol> <li>three tables of pick-ups <ol> <li>per nationality per country</li> <li>per country per type</li> <li>per nationality per type</li> </ol> </li> </ol> </li> <li>on page 4: <ol> <li>a max/min/average summary for all types</li> <li>a max/min/average summary for all nationalities</li> <li>a max/min/average summary for all countries</li> <li>a table that summarizes the distance per country, split in non border-crossing and border-crossing rides</li> </ol> </li> <li>on page 5: <ol> <li>a table of totals for all distances per day</li> <li>a table of totals for all speeds per day</li> <li>a table with totals per weekday</li> <li>a table with totals per month</li> <li>a table with progressive totals for 24 hour periods</li> <li>a table with the total period in days hitched during the year</li> </ol> </li> </ul> </li> </ul> However, some logical pages may overflow physical pages, which is most likely to happen with the page that contains your most seen type, especially if you've visited a fair amount of countries.

The set of programs contains an optional program to remove all data that does not relate to the current trip from this file, leaving only <ul> <li>five pages for the current trip</li> <li>a 'Totals per type' separator page, followed by one page for every type that appeared in the current trip,</li> <li>a 'Totals per country' separator page, followed by one page for every country that appeared in the current trip,</li> <li>a 'Totals per nationality' separator page, followed by one page for every nationality that appeared in the current trip, and</li> <li>a 'Totals per year' separator page, followed by five pages per year for the year(s) of the current trip,</li> </ul> which is kinder to trees, if you insist on also keeping the results on paper.

The daily summary output file: 'days.h-h'
This file contains one table with a line for every calendar day of every trip, detailing <ul> <li>the number of the trip</li> <li>the day in the trip</li> <li>the distance hitched during the day</li> <li>the (actual) driving time during the day</li> <li>the average velocity for the day</li> <li>the date</li> </ul>

A follow-up program will process this file, putting the original single column data in four columns of 70 rows. It also sorts the file into three additional orders, Distance, Time and Velocity. If the data is required to be in .RTF format, this program is required.

The formatted input data output file: 'trip.h-h'
This file merely puts the input data into a neat table (zapping the odometer and place of departure & arrival columns). The program will paginate trips that do not fit on A4 paper.