Recently, I’ve started to analyse data gathered via the online questionnaire which is central to my thesis. This means having to get acquainted with a little bit more statistical analysis than I am comfortable with – i.e. pushing beyond the basics of mean, median, mode, and standard deviation, although naturally I first had to refresh my memory of those as well. Along with a copy of IBM’s excellent SPSS program (essentially, software for undertaking both basic and complex statistical analyses) and a copy of Julie Pallant’s SPSS Survival Manual, the complicated cycle of deriving numbers from text, recoding existing numbers into other numbers ➞ extrapolating from those numbers other, more illuminating numbers ➞ interpreting and then turning these new numbers back into narrative and prose, begins! Let’s just say that adjusting my research questions into something that will conform with the mystical world of dependent and independent variables is an intriguing process. Initial tests have led me to make a number of observations, some of which I think are worth sharing, especially with other humanities/social science researchers:
- Contrary to popular misconceptions about the coolly objective operating manual-style of science, there are, if you care to look beyond basics, almost as many disagreements about method, applicability and interpretation when it comes to statistics as there are about whether or not god exists. Well, okay, maybe not quite as many. But you get my point.
- The reassuring tone of a beginner’s textbook is wonderful but also dangerous. Particular authors will recommend making certain assumptions and using certain techniques that other authors argue just as convincingly against. Using one over the other may appear a trivial decision, if you are even aware (as a novice) of the debate to begin with. In reality, the decision you make about which author to trust can make a huge difference to the output you end up with. An output that cuts (or seems to) right to the heart of your research.
- Debates flagged up in various books are troubling and usually glossed over – can we really charge ahead with parametric tests when data does not look very normal? To what extent is it justifiable to manipulate (i.e. alter) data so that different more “robust” tests can be used? If I will never in a million years understand the maths behind a given procedure, how confident can I ever really be about using it?
As a result of all of this, statistics are often sloppily applied or deliberately misused; researchers proceed from all the wrong assumptions because they don’t really know what they are dealing with, or they already know what result they want. Knowing that nobody will really dig very deeply anyway, it can be assumed that most readers skip ahead to the conclusions. Naturally, there will be differences according to academic field (very relevant for my work!) in how statistics are perceived, used and justified. Young Min Baek writes of statistics in communication studies:
Like most social scientific terms, statistical terms and their findings are academically and/or socially (re)constructed facts. Statistical methods are not given, but created and (re)constructed for specific reasons in various disciplines before the birth of the communication field. Methodological myths, such as subjectivity or neutrality, are reinforced by learning of statistics as something given, not as something constructed. Learning something established does not demand critical minds that statistics can be changed for more appropriate understanding of communication. Communication students simply learn statistics from a communication methodology course, or an introductory statistics course. Most, if not all, students rarely have an interest in how statistical terms or concepts are born and (re)constructed throughout intellectual history in diverse academics. They just learn the basic logic and its applications to the understanding of social worlds.1
A friend who knows just a little bit more about all this than me suggested:
If you want to get some excitement out of statistics, ignore classical probability theory and use quantum probabilities. Statistics could be more fun than the usual Kolmogorovian bore, if only statisticians would not be so boring themselves…
Hmm. Right. I think maybe what he means by that is that standard statistical methods do not capture the subtlety at the heart of chaotic “reality”. But I can’t be sure. Software helps us but also flatters us, letting us click buttons and tick boxes to pretend that we are in some ways mathematicians. For that, I am grateful but also (as a “truth-seeker”) a little concerned. How far I can do any more than learn the basic logic, is unclear, but at least I am aware of some of these issues. I have plenty more analysis ahead of me, and I’m sure it’s going to continue being challenging, infuriating, fun, and informative. Right now though, I feel like Mulder in the X Files – the truth is out there, but I’m not sure if I will ever be able to prove it, or even if proof is the most relevant concept…watch this space!
1Baek, Y. M. (2008). The role of social statistics in communication research. Paper presented to the Rhetoric of Science and Technology Division for the 2008 National Communication Association; San Diego, November.
Presenting a paper at Sheffield University’s inaugural iFutures conference, Thursday saw me taking my first trip to the Steel City. Having been a student again for 2 years now, the 5am start was a bit of a shock to the system, so I was very happy to find a lovely little on-campus cafe selling amazingly fluffy two-egg omelettes and a decent Fairtrade coffee (extra strong, naturally). Wolfing these down and wondering why, in 30 years, I’d never before heard of Yorkshire’s “famous” Henderson’s Relish (have you?) I perused the day’s programme and gave my slides a final once-over. The conference – tagline: “the next 50 years”, since Sheffield’s iSchool is currently celebrating its 50th birthday – was run entirely by Postgrads and aimed to provide a “forum for students to present forward-thinking and challenging research” in an “encouraging environment”. The organisers had accordingly “blocked” (in tongue-in-cheek fashion) their iSchool seniors from attending, focussing instead on attracting an audience of young/early-career academics. This worked out well; the event was no less intellectual, stimulating or professional, but for the students presenting, the day was made less intimidating in that ideas could be exchanged and space carved out more freely without fear of overtly supervisory objections.
Topics included the impact of ICTs on informal scientific communication, Institutional Repositories in Thailand, Chemoinformatics, telehealth project management, the ways in which public libraries can pro-actively support and respond to their communities, and a “radical” new approach to the analysis of classification schemes. A post-lunch Pecha Kucha session saw us voting via an “audience participation device” for the best and most engaging presenter. Pecha Kucha, if you haven’t come across it, is a trendy but very fun method of rapid-fire presentation – 20 slides are pre-programmed to be on screen for only 20 seconds each, meaning that the presenter ends up “pitching” a vision as much as opening up a debate and therefore has to be more creative. Facing stiff competition, Simon Wakeling’s take on the Future of the Filter Bubble was decided most worthy of a prize. My own full-length paper, which was also well received, was more traditional, describing a methodology for assessing academics’ attitudes toward new media and why that matters.
So what is the future of our field, which might broadly be called “Information Science”? Predicting the future is a dubious enterprise, and in an age of almost maniacal technological development, it becomes even harder to know what is scientifically probable and what is just science-fiction. Still, disclaimers aside, we can make some informed speculations based on current socio-technical trends. Two impressive keynote speakers – Professor Diane Sonnenwald (University College Dublin and the University of North Carolina at Chapel Hill) and Vanessa Murdock (Principal Applied Researcher at Microsoft Research) – were on hand to share their views. Coming from quite different perspectives, both shared thoughts about where information science should, or might, concentrate its energies. As a group, we possess much expertise that could help solve pressing social and environmental problems; failing health, climate change, inequality, global insecurity. While remedies for these might be figured out by analysts of the “big data” coming from scientific sensors and digitally mediated environments, disaster prevention initiatives and “crisis informatics” will only be successful if those creating systems, strategies and technologies are supported by experts able to assess their impacts on work patterns, task performance, and their wider (often unconsidered) socio-cultural effects.
Describing her own research into 3D medical Telepresence devices, Professor Sonnenwald emphasised that information professionals must make sure we are “at the table” when research projects and funding priorities are discussed institutionally and internationally. The kind of analyses that we undertake may lead to short-term headaches for those developing products – for example, one of her studies showed a particular device to be more flawed than its initial backers supposed – however in the long run, this is a good thing not just for them but for all of us. It’s cheaper to address design issues pre- rather than post-production, and, economics aside, we must make sure that the groups whose problems we try to solve are not inadvertently given more of them by shimmering but naively designed solutions. In an age of algorithms and automation, information science is far from redundant.
Vanessa Murdock focussed on how we can map the world and its preoccupations through the harvesting and analysis of social media data. Location-aware and location-based services on smartphones and web-browsers are one obvious example; Microsoft and others are working hard to build the “hyper local” as well as the personalised into their products. If you’re in Oslo and you fancy a pizza, wouldn’t it be nice to see at a click which restaurant near you has a menu to match your dietary requirements, what other customers thought about it, and where, based on your tastes, you might go afterwards? Less trivially, it would be valuable for sociologists, political economists and others to discover with reliability precisely where most tweets about Revolution X are coming from in order to ascertain the demographics of those tweeting them and what percentage of the population they actually represent. Naturally such applications are not without their issues. We need to think deeply about privacy, data protection, regulation and – at a technical level – the reliability of services based on data which are often difficult to interpret syntactically and semantically. Further, aren’t companies really just servicing the “Technorati”, treating them as typical of the needs and preferences of humanity when in fact, they are only a small and (it might be argued, insubstantial) minority? Reminding us of a need to understand the difference between solutions that work on “toy data” or simplified abstract models, and those which work when applied to reality, Murdock also pointed out that while “you should take the noble path and build things which are useful when possible, there is also a role for building things which are cool!”
Sheffield has about 60 PhD Students working in the two main research groups of their Information School, and it seems that the culture there is as lively as it is cutting edge. All of the presenters were really impressive and I’d like to thank the committee for putting together such a fun event. 🙂
We nearly made it through the winter without any snow…but suddenly with a drop in temperature it’s been descending in lovely soft flakes all day, insistently refusing to melt. This reminds me that (for some forgotten reason) I was reading recently about a man nicknamed “Snowflake Bentley” – a self-taught farmer from Vermont who invented one of the earliest methods of microphotography; using a bellows camera and microscope, he eventually managed to document thousands of snowflakes (or, snow crystals), catching them against black velvet before they were lost forever.
Although there have been musings on snowflakes since at least the time of Johannes Kepler, who figured out a lot about their mathematics, it’s thanks to Bentley that we know no two snowflakes are alike. It’s also thanks to him that modern-day residents of his hometown in Jericho are able to make some extra money selling a variety of snowflake themed merchandise. I wonder if an enthusiastic amateur would be able these days to make such an immense contribution to Science (and to art – because surely that’s what these photographs can be considered)?
A later, more traditional snowflake researcher – if such a thing exists – was Ukichiro Nakaya, a physicist who developed a classification system for snowflakes using images highly influenced by Bentley’s work. In 1936, Nakaya created the first ever artificial snow in his Low Temperature Science Laboratory at Hokkaido University. A museum in his name now displays a range of historic and modern exhibits while involving visitors of all ages in snow-related activities and competitions. The best part – or the geekiest depending on your point of view – is that the museum building is shaped hexagonally to reflect the structure of snowflakes. 🙂
Odd to say that my own camera isn’t as powerful as those ones from a century ago; but anyway I’ve tried to evoke some of the gorgeous white views outside with these photos. In some kind of Christmas-card re-enactment I even saw a robin sitting on the fencepost by my window. He flew away too fast for me to capture him.