flypig.co.uk

Waste and recycling tracking

I generally try hard to minimise my environmental impact, but it can be a challenge without being aware of the actual effect that daily decisions have. Now that I live in a one-person flat, I'm much more conscious of exactly how much energy I use, how much stuff I consume, how many possessions I accumulate and how much waste all this activity produces.

I've therefore decided to keep track of how much waste I produce much more accurately. My apartment block in Tampere, Finland, provides bins for six types of waste: paper, card, glass, metal, compost and general waste. Everything except the last of these can be recycled (although whether the council does or not, I don't know). In addition, Finland has an exemplary network for financially-incentivised bottle and can returns, with return stations in pretty much every grocery shop. Each week I therefore find myself splitting my waste into eight different categories. It's a fair bit of effort, so taking weight measurements aswell isn't such a big deal.

The following graph shows the week-by-week data I've collect about my waste output. I plan to update it every week. Click on the graph for a larger version.

 

 

Some brief points to note about the graph:

  1. The weekly average, calculated since the start, is shown in the vertical bar on the right hand side.
  2. All of the green items are recyclable and should be recycled by the council. The general waste in red isn't recycled.
  3. There's only me living in my flat, so this is output for a single person.
  4. I'll continue updating the graph on this page every week.

The data is beginning to build up already now, so a somewhat clear picture is emerging. I'll continue collecting data over time to see how things are progressing, with the aim of reducing my waste output (both recyclable and non-recyclable) over time if I can.

 

Waste

19 Nov 2019 : Graphs of Waste, Part 2: A Continuous Histogram Approach #
In part one we looked at how graphs can be a great tool for expressing the generalities in specific datasets, but how even seemingly minor changes in the choice of graphing technique can result in a graph that tells an inaccurate story.

We finished by looking at how a histogram would be a good choice for representing the particular type of data I've been collecting, to express the quantity of various types of waste (measured by weight) as the area under the graph. Here's the example data plotted as a histogram.
 
All data plotted as a stacked histogram


While this is good at presenting the general picture, I really want to also express how my waste generation is part of a continuous process. In the very first graph I generated to try to understand my waste output, I drew the datapoints and joined them with lines. This wasn't totally crazy as it highlighted the trends over time. However, it gave completely the wrong impression because the area under the graph bore no relation to the amount of waste I produced.

How can we achieve both? Show a continuous change of the data by joining datapoints with lines, while also ensuring the area under the graph represents the actual amount of waste produced?

The histogram above achieves the goal of having the area under the graph represent the all-important quantities captured by the data clearly visible in the graph. But it doesn't express the continuous nature of the data.

Contrariwise, if we were to take the point at the top of each histogram column and join them up, we'd have a continuous line across the graph, but the area underneath would no longer represent useful data.
If we want to capture a `middle ground' between the two, it's helpful to apply some additional constraints.
  1. The line representing the weights should be continuous.
  2. The area under the line should be the same as the area under the histogram column for each column individually.
  3. For each reading, the line can be affected by the readings either side (this is inevitable if the constraint 1 is going to be enforced), but should be independent of anything further away.

To do this, we'll adjust the position of the datapoints for each of the readings and introduce a new point in between every pair of existing datapoints as follows.
  1. Start with the datapoints positioned to be horizontally centred in each column and taken as the height of the histogram column that encloses it.
  2. For every pair of datapoints A and B, place an additional point at the boundary of the columns for A and B, and with y value set as the average between the two columns A and B.

Following these rules we end up with something like this.
 
Plotting between the midpoint of each histogram column

This gives us our continuous line, but as you can see from the diagram, for each column the area under the line doesn't necessarily represent the quantity captured by the data. We can see this more easily by focussing in on one of the columns. The hatched area in the picture below shows area that used to be included, but which would be removed if we drew our line like this, making the area under the line for this particular region less than it should be.
 
Considering a single column of the histogram

Across the entire width of these graphs the additions might cancel out the subtractions, but that's not guaranteed, and it also fails our second requirement that the area under the line should be the same as the area under the histogram column for each column individually.

To address this we can adjust the position of the point in the centre of each column by altering its height to capture the correct amount of area. In the case shown above, we'd need to move the point higher because we've cut off some of the area and need to get it back. In other cases we may need to reduce the height of the point to remove area that we over-captured.
 
The elements making up the column The area under the lines for a column
To calculate the exact height of the central point, we can use the following formula.

$$ y = 2h - \frac{1}{2} (y_1 + y_2) .
$$
The area $A = A_1 + A_2 + A_3 + A_4$ under the curve can then be calculated as follows.

\begin{align*} A & = \left( \frac{w}{2} \times y_1 \right) + \left( \frac{w}{2} \times y_2 \right) + \left( \frac{1}{2} \times \frac{w}{2} \times (y - y_1) \right) + \left( \frac{1}{2} \times \frac{w}{2} \times (y - y_3) \right) \\ & = \frac{w}{2} \left( \frac{1}{2} y_1 + \frac{1}{2} y_2 + y \right) . \\ \end{align*}
Substituting $y$ into this we get the following.
\begin{align*} A & = \frac{w}{2} \left( \frac{1}{2} y_1 + \frac{1}{2} y_2 + 2h - \frac{1}{2} y_1 - \frac{1}{2} y_2 \right) \\ & = wh. \end{align*}

Which is the area of the column as required.

Following this approach we end up with a graph like this.
 
Line after adjusting the midpoints to account for the area under the graph

Which taken on its own gives a clear idea of the trend over time, while still capturing the overall quantity of waste produced in each period as the area under the graph.
 
The line without the histogram, but still retaining the area-under-the-graph property

In the next part we'll look at how we can refine this further by rendering a smooth curve, rather than straight lines, but in a way that retains the same properties we've been requiring here.

If all goes to plan, part 3 should appear here on the 26th November.

All of the graphs here were produced using the superb MatPlotLib and the equations rendered using MathJax (the first time I'm using it, and it looks like it's done a decent job).
Comment
19 Nov 2019 : Graphs of Waste, Part 2 #
Part 2 of my series on embellishing histograms is now up on my blog. This post discusses a "continuous histogram" visualisation. It discusses how can you take data that accumulates over time that might usually be presented in a histogram, but instead render it using a continuous line without misrepresenting the data.
16 Nov 2019 : Waste data #
I've added another week's worth of data about my waste and recycling to the waste page. I made the mistake of trying to make Turkish Delight again this week (sadly still without any decent results). So, lots of grapefruit skins weighing down the compost. More concerning is that my general waste — the most damaging category — is up on last week by a big margin. It sounds terrible, but most of that was because I've been suffering from a bad cold and went through several packs of tissues (in Finland they come in packs, not boxes). Nobody benefitted from that! If you're taking an interest in my waste output, you might also be interested in my series of posts about the waste graphs I'm using. Part 1 is on my blog.
12 Nov 2019 : Graphs of Waste, Part 1 #
Over the next four weeks I'll be posting a series of articles on my blog about how I'm improving the graph on my waste page. The current graph is bad and needs fixing, and in the articles I plan to describe how. The first part entitled "Choose Your Graph Wisely" is now up on my blog.
12 Nov 2019 : Graphs of Waste, Part 1: Choose Your Graph Wisely #
I have to admit I'm a bit of a data visualisation pedant. If I see data presented in a graph, I want the type of graph chosen to match the expressive aim of the visualisation. A graph should always aim to expose some underlying aspect of the data that would be hard to discern just by looking at the data in a table. Getting this right means first and foremost choosing the correct modality, but beyond that the details are important too: colours, line thicknesses, axis formats, labels, marker styles. All of these things need careful consideration.

You may think this is all self-evident, and that anyone taking the trouble to plot data in a graph will obviously have taken these things into account, but sadly it's rarely the case. I see data visualisation abominations on a daily basis. What's more it's often the people you'd expect to be best at it who turn out to fall into the worst traps. Over fifteen years of reviewing academic papers in computer science, I've seen numerous examples of terrible data visualisation. These papers are written by people who have both access to and competence in the best visualisation tooling, and who presumably have a background in analytical thinking, and yet graphs presented in papers often fail the most basic requirements. It's not unusual to see graphs that are too small to read, with unlabelled axes, missing units, use of colour in greyscale publications, or with continuous lines drawn between unrelated discrete data points.

And that's without even mentioning pseudo-3D projections or spider graphs.

One day I'll take the time to write up some of these data visualisation horror stories, but right now I want to focus on one of my own infractions. I'll warn you up front that it's not a pretty story, but I'm hoping it will have a happy ending. I'm going to talk about how I created a most terrible graph, and how I've attempted to redeem myself by developing what I believe is a much clearer representation of the data.

Over the last couple of months I've been collecting data on how much waste and recycling I generate. Broadly speaking this is for environmental and motivational reasons: I believe that if I make myself more aware of how much rubbish I'm producing, it'll motivate me to find ways to reduce it, and also help me understand where my main areas for improvement are. If I'm honest I don't expect it'll work (many years ago I was given a device for measuring real-time electricity usage with a similar aim and I can't say that succeeded), but for now it's important to understand my motivations. It goes to the heart of what makes a good graphing choice.

So, each week I weigh my rubbish using kitchen scales, categorised into different types matching the seven different recycling bins provided for use in my apartment complex.
 
The bins at my apartment complex

Here's the data I've collected until now presented in a table.
 
Measurements of waste and recycling output (g)
Date Paper Card Glass Metal Returnables Compost Plastic General
18/08/19 221 208 534 28 114 584 0 426
25/08/19 523 304 702 24 85 365 123 282
01/09/19 517 180 0 0 115 400 0 320
06/09/19 676 127 360 14 36 87 0 117
19/09/19 1076 429 904 16 0 1661 0 417
28/09/19 1047 162 1133 105 74 341 34 237
05/10/19 781 708 218 73 76 1391 54 206
13/10/19 567 186 299 158 40 289 63 273

 
We can't tell a great deal from this table. We can certainly read off the measurements very easily and accurately, but beyond that the table fails to give any sort of overall picture or idea of trends.

The obvious thing to do is therefore to draw a graph and hope to tease out something that way. So, here's the graph I came up with, and which I've had posted and updated on my website for a couple of months.
 
Data plotted directly on a graph

What does this graph show? Well, to be precise, it's a stacked plot of the weight measurements against the dates the measurements were taken. It gives a pretty clear picture of how much waste I produced over a period of time. We can see that my waste output increased and peaked before falling again, and that this was mostly driven by changes in the weight of compost I produced.

Or does it? In fact, as the data accumulated on the graph, it became increasingly clear that this is a misleading visualisation. Even though it's an accurate plot of the measurements taken, it gives completely the wrong idea about how much waste I've been generating.

To understand this better, let's consider just one of the stacked plots. The red area down at the base is showing the measurements I took for general waste. Here's another graph that shows the same data isolated from the other types of waste and plotted on a more appropriate scale.
 
The line plotted for general waste

If you're really paying attention you'll notice that the start date on this second graph is different to that of the first. That's because the very first datapoint represents my waste output for the seven days prior to the reading, and we'll need those extra seven days for comparison with some of the other plots we'll be looking at shortly.

There are several things wrong with this plot, but the most serious issue, the one I want to focus on, is that it gives a completely misleading impression of how much waste I've been generating. That's because the most natural way to interpret this graph would be to read off the value for any given day and assume that's how much waste was generated that day. This would leave the area under the graph being the total amount of waste output. In fact the lines simply connect different data points. The actual datapoints themselves don't represent the amount of waste generated in a day, but in fact the amount generated in a week. And because I don't always take my measurements at the same time each week, they don't even represent a week's worth of rubbish. To find out the daily waste generated, I'd need to divide a specific reading by the number of days since the last reading.

Take for example the measurements taken on the 6th September. I usually weight my rubbish on a Saturday, but because I went on holiday on the 7th I had to do the weighing a day early. Then I was away from home for seven days, came back and didn't then weight my rubbish again until the 19th, nearly two weeks later.

Although I spent a chunk of this time away, it still meant that the reading was high, making it look as if I'd generated a lot of waste over the two-week period. In fact, considering this was double the time of the usual readings, it was actually a relatively low reading. This should be reflected in the graph, but it's not. It looks like I generated more rubbish than expected; in fact I generated less.

We can see this more clearly if we plot the data as a column (bar) graph and as a histogram. Here's the column graph first.
 
General waste plotted as a bar chart

These are the same datapoints as in the previous graph, but drawn as columns with widths proportional to the duration that the readings represent. The column that spreads across from the 6th to the 19th September is the reading we've just been discussing. This is a tall, wide, column because it represents a long period (nearly two weeks) and a heaver than usual weight reading (because it's more than a weeks' worth of rubbish). If we now convert this into a histogram, it'll give us a clearer picture of how much waste was being generated per day.
 
General waste plotted as a histogram

This histogram takes each of the columns and divides it by the number of days the column represents. A histogram has the nice property that the area — rather than the height — of a column represents the value being plotted. In this histogram, the area under all of the columns represents the quantity of waste that I've generated across the entire period: the more blue, the more waste.

Not only is this a much clearer representation, it also completely changes the picture. The original graph made it look like my waste output peaked in the middle. There is a slight rise in the middle, but it's actually just a local maximum. In fact the overall trend was that my daily general waste output was decreasing until the middle of the period, and then rose slightly over time. That's a much more accurate reflection of what actually happened.

It would be possible to render the data as a stacked histogram, and to be honest I'd be happy with that. The overall picture, which ties in with my motivation for wanting the graph in the first place, indicates how much waste I'm generating based on the area under the graph.
 
All data plotted as a stacked histogram

But in fact I tend to be generating small bits of rubbish throughout the week, and I'd like to see the trend between readings, so it would be reasonable to draw a line between weeks rather than have them as histogram blocks or columns.

So this leads us down the path of how we might draw a graph that captures these trends, but still also retains the nice property that the area under the graph represents the amount of waste produced.

That's what I'll be exploring in part two.

All of the graphs here were generated using the superb MatPlotLib.
Comment
10 Nov 2019 : Waste data #
I've added this week's waste measurements to the waste page. This week I tried to make Turkish Delight, which involved squeezing five big ol' grapefruit. The massive increase in compostable waste is down to the leftover grapefruit skins. Unfortunately the Turkish Delight turned out terribly. I'm now eating it as jam instead.
3 Nov 2019 : Waste data dump #
I've added more data to my waste and recycling tracking page. It was a lean fortnight, but mostly because I was away in the UK for half of the time. Even taking this into account though, my waste output is down across the board with the exception of a small increase (a tin-can's worth) in metal. Let's see what happens in future weeks as winter draws in for a clearer picture though.
19 Oct 2019 : More waste, more data #
Today I added more waste data to my recycling and waste graph. The overview is that glass is up for some reason, whilst compost is down. That's good because I've been making a special effort not to waste food this week. I dumped a bunch of newspapers that have been stacking up, which masks the fact my paper reduction plan seems to be working: I received no junk mail at all this week!
13 Oct 2019 : Waste and recycling data #
Another week, another round of rubbish weighed. I'm pleased it went down a bit this week from 2.5kg to 1.8kg total, mostly due to a big reduction in compostables being thrown away this week. Weighing my rubbish has highlighted how much of it comes from junk mail, so yesterday I added a note to my door that reads "Ei mainoksia kiitos" ("No ads please"). Let's see if that reduces my paper waste in future.
5 Oct 2019 : Waste and recycling data #
I've weighted my waste and the new numbers have been added to my waste tracking page. This week compost and card are up, while glass is down. My average is still around the 2.5kg level.
28 Sep 2019 : Waste and recycling data #
I've decided to start collecting data on how much waste I produce each week. Might help me reduce it over time. Check out my new waste info page for the full details.
 

Comments

Uncover Disqus comments