(by Julien Cornebise)
Determined to broaden my mind, I strayed from the familiar fields of mine to more of a mine field: yesterday’s session on The Human Cultural and Social Landscape of Afghanistan and Iraq, organized by the Section on Statistics in Defense and National Statistics. Defense and military are not very much my strong suit, so I was doubly curious. And I was really glad I went, on several levels, both scientific, cultural, political and human.
Starting with the human level: midway through the last talk, mathematically interested, I was suddenly hit by the following down-to-earth thought:
Every single one of those dots in the curve are actual dead human beings.
Between 1 and 10,000 dead people. From casualty to massacre. Left of the axis: industrial butchery. Right of the axis: artisanal hand-crafted death. While not a faint of heart, this sudden surge of empathy in what I was so far seeing as a purely intellectual challenge was mind-shaking. I was suddenly miles away from JSM, and from the usual numbers I’m manipulating. I was far from my theorems, my algorithms, and my comfy chairs. And although yesterday’s session on Ethics in Statistics was nowhere close to the topic, I felt strongly for the question of empathy in statistics, and how do you deal with such topics. Although years ago I really enjoyed Mark Yor‘s talk intervention years ago in a plenary lecture to high-schoolers that “there’s more to do with maths than finance!“, I am not sure that either him or I were thinking of such topics. Thought-provoking. As a side note, it gave me a better understanding of Kristian Lum‘s enthusiasm for her work on disappearances in Colombia at Benetech.
Getting back to more mainstream considerations. On the scientific and cultural level, the first talk by Y. H. Said was a sociological description of the social structure of the tribal-patrilineal Afghan culture, very different from our own — a great incentive to re-read the history of the last 10 years. It shed new light on the relationship between local military/tribal chiefs and foreign forces, and on why the Talibans refused to turn in Bin Laden, on ground of the “Nanatwey“, the code of honor dictating to never refusing asylum even if you do not like the person you’re hosting. It also explained how trying to break down the drug trade is actually trying to break down the “qawn“, a flexible and evolving social network with encompassing links between very different actors that, in western society, we wouldn’t think to associate. The kind of talk that would be a blessing to spread to a broader audience, e.g. via the New York Times or popular journals.
The second talk took a more quantitative approach, with object modelling and UML diagrams to simulate precisely such networks. I was a bit less impressed, maybe because I am not sure we can yet apply such hard models to populations –– Hari Seldon might not yet be there. An interesting try, though!
The third and last talk of the session generated the most questions and my most interest. Tim Gulden presented an exploratory data analysis via power-laws on the number of deaths in “incidents” in Iraq. He does not claim to be a statistician, but an expert of the application field, and started by a comparison with Guatemalan conflict (which involved acts of war and acts of genocide, with two different statistical patterns) and with the war in Kenya: this was as clear as it was convincing. He then moved on to studying the same kind of data for the Iraq conflict,
- showing what where the discrepancy between the data and his model for specific years: a lack of high-valued points, i.e. incidents with very large amount of casulaties;
- what manipulation of the data lead to a better fit: scaling the ranks of the incidents up by 1%, i.e. saying that there should be additional 1% of incidents with highest death toll;
- and which reality could those manipulation correspond to: a suppression of the three top-level structures of command in Iraqi army, precisely what the US was focusing on in the first years of the conflict, hence leading to several small subgroups without the manpower for the largets offensives tpically leading to those 1% highest death tolls.
Of course, there is a question of “if you look for something, you will find something”: how much of those findings are a nice hypothesis that happen to be a posteriori backed up by data, and how much are genuine patterns discovered by analysis? How much is it a mathematical construct, how much is it a statistical analysis? I would love to know what Andrew thinks of it, especially given his Introductory Lecture. Again, Tim Gulden does not pretend by any means to be a statistician — and I am sure, given the passion that he transmitted, that he would very much welcome those willing to help! (his email is in his CV)
Tim also mentioned his data source, the IBC database, an extensive database of press articles on the conflict. He honestly highlighted what where his caveats in this regard, how he was absolutely unable to overturn or assess the possible sampling bias. At this point on, a very useful intervention from the attendance: a (former?) military who was in Iraq and is writing his own book on the topic said that, from his experience, this database is the most accurate you can find publicly available, as compared to the classified data to which he had access but was in no mean to publish.
This lead to my political teaching of the day: I candidly asked whether the Wikileaks Iraq Warlogs could be used for such an analysis. And although both agreed that, indeed, it would be possible, Tim underlined that he was funded by US Navy, and hence would be worried that they would not be happy for him to look at them — note that he precises, and I understand, that military data might not bring more precision than journalist’s data, at least on the body count, in terms of lack of coverage of the most dangerous regions.
Of course, nothing surprising in this, the problem is the same with many sources of fundings. However, this is an issue that I rarely run into personally, and it was interesting to see it pop up here, bringing this session to a well-round experience. Althgouh understandable and logical, I nevertheless twitch that a researcher so conscientious and passionate as Tim seems to be can fear for his funding to use data that have now been published in international newspapers! (Guardian, New York Times, Spiegel). Those are only my opinions, and I can imagine the alternate view that at least his funding allows him to do this research so far — but in my naive youth I am nevertheless inclined to ask for a full freedom for researchers: you are funded to do research on a topic, without being limited in your inputs or your outputs. Again, this is my very personal opinion, by no means ASA’s or anyone else’s here, not even Tim’s whom opinion on this I did not ask — I am very clear on this latter point. Just my own food for thought, and I would love to hear what ASA’s Commitee on Professional Ethics would have to say.