Abbott Katz details spreadsheet techniques for prospecting the stories beneath the rows and columns.

Adventures in spreadsheet journalism - a text-book problem: Part 1

Kudos to Princeton University Press for having switched on the bright and interesting idea of unrolling its catalogue of titles into spreadsheet form, one each per academic discipline.  I majored in sociology (but don’t hold that against me), and a click on that very name on the Princeton site brought me to the worksheet excerpted below:

Straightforward enough (though you’ll need to modulate the column widths) if not conspicuously stirring, though I would submit that a spell of reflection would expose some interesting story angles, e.g., price data, title and subtitling trends, etc, all the more interesting were you to copy and paste the titles from all the discipline spreadsheets into one mega-workbook, but hang on; do you have the sneaking suspicion that something is missing?

I share your suspicion; because hidden in plain sight, above row 1, is…nothing. That is, there are no headers heading the data, and the Beatles wrote a song about this kind of omission: You Can’t Do That. Spreadsheet data management – sorting, filtering, or pivot tabling, and the like – can’t productively proceed without headers, unless you’re prepared to do some massive hoop-jumping that, by any reckoning, would qualify as wasted motion.

And exactly why have the data been denied their headers? I put that question to the Princeton press webmaster (gender indeterminate; the email was signed webmaster), whose reply was both swift and most interesting: the data are header-free because “the columns are self-explanatory”.

Well, that judgement may – may – be true, but even so, it rather misses the point, for the reasons recounted two paragraphs ago. Moreover, I had to guess what the initialed data in the fourth column – the Ps and Cs – represent, though it turns out I guessed correctly (they stand for paper and cloth, respectively, though I’ve since encountered other initials as well, including V and E. The latter might abbreviate E-book, though).

In any event, you need to compose headers, by inserting a row above the current row 1 and inventing your titles. But there’s another problem stalking the sheet – something impairing the data in column G, which record which I take to be the British, pound-based prices for the books . It turns out those prices have been formatted as text and as such are numerically inert, meaning they can’t be added or averaged, though they can be counted. Try incorporating any of the “prices” into a formula and you’ll see what I mean. Another betrayal of the data’s text-formatted status: click on any one of the cells and gaze at the formula bar, e.g.,

If that figure were truly a number you’d see just that – 22.95 – in the bar; an authentic currency format, be it dollars or pounds, wouldn’t presume itself upon the number. Currency formats are phantom embellishments which in no way modify the actual number with which they’re teamed. The formula bar conveys a number’s actual value – and nothing else.

The task, then: to convert all those faux text “values” into actual, usable ones. The means for doing so: none other than the Text-to-Columns option. Don’t go away.

 

Sign up for our free weekly digital magazine, Press Gazette Journalism Weekly, and daily newsletter
To contact Press Gazette with a story call 0207 936 6433
or email pged@pressgazette.co.uk
To advertise, please call 0207 936 6764.