Sharing Texas ‘Death Row’ execution data

Earlier today I came across a website that contained some incredibly intriguing data. The website is the Texas Department of Criminal Justice and the particular page of interest is a collection of records for the 500 ‘Death Row’ offenders who have been executed.

TexasDCJ

Each record details the offender, their demographics, a summary of the incident for which they were convicted and – most compelling of all perhaps – the final words uttered in their last statement.

TexasDCJ_LastStatement

Over the past couple of hours I’ve been wrestling with the website to try pull out all the data into a nice, neat workable format. I’ve partly succeeded and partly failed. I’ve been using Google Docs and a combination of the ImportHTML and ImportXML functions. They work quite well but have limitations particularly with inconsistently structured websites and also because you can only use x amount of each function in a single spreadsheet).

I have managed at least to pull out all the ‘Last statement’ data. It took me longer than I intended because I found myself reading them too much. The data I’ve temporarily given up on trying to extract is the Offender Information, providing deeper details about the individuals and their crimes. I gave up because many records only exist as uploaded .jpg image files which makes it a non-starter frankly.

TexasDCJ_Offender

Anyway, this isn’t just for me, I was really interested in the data and wanted the opportunity to practice working with scraping data from websites for an upcoming tutorial. I have shared the current version of the Google Spreadsheet, it is open to the public. You can find it by clicking on the image below.

ExecutionData

I’m not sure when/if I’ll get time to work on any designs with this, as much as I’d love to, but please do share anything that you work on (as well as any additional data you suck out of the site).


Update: Aleksey Nozdryn-Plotnicki has taken the scraping of the data task further, you can read more about it here and get the enhanced dataset here.