{
"cells": [
{
"cell_type": "markdown",
"id": "1829a544-9bdf-442e-a248-29af0debe81c",
"metadata": {},
"source": [
"# Data Wrangling NI House Price Index Data\n",
"\n",
"This is a 'messy' 'blog post' that's just a braindump of a notebook to step through [NI House Price Index](https://www.nisra.gov.uk/statistics/housing-community-and-regeneration/northern-ireland-house-price-index) datasets I was playing around with. \n",
"\n",
"It's mostly code, so if you were here from some 'insight', feck aff. \n",
"\n",
"There is **no** analysis here, this is **just** data wrangling.\n",
"\n",
"TLDR As always, Government Open Data has over the years gone from 'non-existent' to 'garbeled' to 'inconsistent' and I feel is now in the stage of 'consistently inconsistent', which is progress in my eyes. \n",
"\n",
"# Preamble Code, move on."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "8b2cf13d-c9a6-4232-bfb8-c8b9f6a9b979",
"metadata": {},
"outputs": [],
"source": [
"from bs4 import BeautifulSoup\n",
"import pandas as pd\n",
"import requests\n",
"\n",
"# Pull the latest pages of https://www.finance-ni.gov.uk/publications/ni-house-price-index-statistical-reports and extract links\n",
"\n",
"base_url= 'https://www.finance-ni.gov.uk/publications/ni-house-price-index-statistical-reports'\n",
"base_content = requests.get(base_url).content\n",
"base_soup = BeautifulSoup(base_content)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5016c0a3-b75f-47b9-aef0-3c1f312bc8fc",
"metadata": {},
"outputs": [],
"source": [
"for a in base_soup.find_all('a'):\n",
" if a.attrs.get('href','').endswith('xlsx'):\n",
" source_name, source_url = a.contents[1],a.attrs['href']\n",
" \n",
"source_df = pd.read_excel(source_url, sheet_name = None) # Load all worksheets in \n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "e1778edd-7203-4568-b49d-c7b2574de9ee",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['Cover Sheet', 'Contents', 'Table 1', 'Table 2', 'Table 2a', 'Table 2b', 'Table 2c', 'Table 2d', 'Table 3', 'Table 3a', 'Table 3b', 'Table 3c', 'Table 4', 'Fig 5', 'Table 5', 'Table 5a', 'Fig 6', 'Table 6', 'Table 7', 'Table 8', 'Table 9', 'Table 9a', 'Table 9b', 'Table 9c', 'Table 9d', 'Table 10a', 'Table 10b', 'Table 10c', 'Table 10d', 'Table 10e', 'Table 10f', 'Table 10g', 'Table 10h', 'Table 10i', 'Table 10j', 'Table 10k'])"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"source_df.keys()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "156e22d9-a7b8-46f8-9e65-72d2b650ddb3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Table of Contents | \n",
" Unnamed: 1 | \n",
" Unnamed: 2 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Worksheet Name | \n",
" Frequency | \n",
" House Price Index - Quarter 4 2021 | \n",
"
\n",
" \n",
" 1 | \n",
" Table 1 | \n",
" Quarterly | \n",
" Table 1: NI HPI Trends Q1 2005 - Q4 2021 | \n",
"
\n",
" \n",
" 2 | \n",
" Figure 1 | \n",
" Quarterly | \n",
" Figure 1: Graph of NI HPI Q1 2005 - Q4 2021 | \n",
"
\n",
" \n",
" 3 | \n",
" Figure 1a | \n",
" Quarterly | \n",
" Figure 1a: Graph of Percentage Quarterly Chang... | \n",
"
\n",
" \n",
" 4 | \n",
" Figure 1b | \n",
" Quarterly | \n",
" Figure 1b: Graph of Percentage Annual Change Q... | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 58 | \n",
" Table 10h | \n",
" Quarterly | \n",
" Table 10h: Number of Verified Residential Prop... | \n",
"
\n",
" \n",
" 59 | \n",
" Table 10i | \n",
" Quarterly | \n",
" Table 10i: Number of Verified Residential Prop... | \n",
"
\n",
" \n",
" 60 | \n",
" Table 10j | \n",
" Quarterly | \n",
" Table 10j: Number of Verified Residential Prop... | \n",
"
\n",
" \n",
" 61 | \n",
" Table 10k | \n",
" Quarterly | \n",
" Table 10k: Number of Verified Residential Prop... | \n",
"
\n",
" \n",
" 62 | \n",
" Figure 11 | \n",
" Quarterly | \n",
" Figure 11: Number of Verified Residential Prop... | \n",
"
\n",
" \n",
"
\n",
"
63 rows × 3 columns
\n",
"
"
],
"text/plain": [
" Table of Contents Unnamed: 1 \\\n",
"0 Worksheet Name Frequency \n",
"1 Table 1 Quarterly \n",
"2 Figure 1 Quarterly \n",
"3 Figure 1a Quarterly \n",
"4 Figure 1b Quarterly \n",
".. ... ... \n",
"58 Table 10h Quarterly \n",
"59 Table 10i Quarterly \n",
"60 Table 10j Quarterly \n",
"61 Table 10k Quarterly \n",
"62 Figure 11 Quarterly \n",
"\n",
" Unnamed: 2 \n",
"0 House Price Index - Quarter 4 2021 \n",
"1 Table 1: NI HPI Trends Q1 2005 - Q4 2021 \n",
"2 Figure 1: Graph of NI HPI Q1 2005 - Q4 2021 \n",
"3 Figure 1a: Graph of Percentage Quarterly Chang... \n",
"4 Figure 1b: Graph of Percentage Annual Change Q... \n",
".. ... \n",
"58 Table 10h: Number of Verified Residential Prop... \n",
"59 Table 10i: Number of Verified Residential Prop... \n",
"60 Table 10j: Number of Verified Residential Prop... \n",
"61 Table 10k: Number of Verified Residential Prop... \n",
"62 Figure 11: Number of Verified Residential Prop... \n",
"\n",
"[63 rows x 3 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"source_df['Contents']"
]
},
{
"cell_type": "markdown",
"id": "48b6a214-7b9a-447c-9857-31b89f978b72",
"metadata": {},
"source": [
"### Fix the Contents sheet to correctly reflect the Worksheet names\n",
"And fix the table headers and sheet-titles while we're at it. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "2d3d3feb-8610-4913-9f45-acd89ef191fb",
"metadata": {},
"outputs": [],
"source": [
"new_header = source_df['Contents'].iloc[0]\n",
"source_df['Contents'] = source_df['Contents'][1:]\n",
"source_df['Contents'].columns = new_header"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "6c18f4f4-f17d-4667-99ee-a2714d1f97c5",
"metadata": {},
"outputs": [],
"source": [
"source_df['Contents'].columns = [*new_header[:-1],'Title']"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "dfd9724d-ca78-4742-9703-3b42432276fd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['Table 1: NI HPI Trends Q1 2005 - Q4 2021',\n",
" 'Table 2: NI HPI & Standardised Price Statistics by Property Type Q4 2021',\n",
" 'Table 2a: NI Detached Property Price Index Q1 2005 - Q4 2021',\n",
" 'Table 2b: NI Semi-Detached Property Price Index Q1 2005 - Q4 2021',\n",
" 'Table 2c: NI Terrace Property Price Index Q1 2005 - Q4 2021',\n",
" 'Table 2d: NI Apartment Price Index Q1 2005 - Q4 2021',\n",
" 'Table 3: NI HPI & Standardised Price Statistics by New/Existing Resold Dwelling Type Q4 2021',\n",
" 'Table 3a: NI New Dwelling Price Index Q1 2005 - Q4 2021',\n",
" 'Table 3b: NI Existing Resold Dwellings Price Index Q1 2005 - Q4 2021',\n",
" 'Table 3c: Number of Verified Residential Property Sales by New/Existing Resold Dwellings Q1 2005 - Q2 2021',\n",
" 'Table 4: Number of Verified Residential Property Sales Q1 2005 - Q4 2021',\n",
" 'Table 5: HPI & Standardised Price for each Local Government District in NI',\n",
" 'Table 5a: Number of Verified Residential Property Sales by Local Government District Q1 2005 - Q4 2021',\n",
" 'Table 6: NI HPI & Standardised Price by Urban and Rural areas of Northern Ireland',\n",
" 'Table 7: Standardised House Price & Index for Rural Areas of Northern Ireland by drive times',\n",
" 'Table 8: Number of Verified Residential Property Sales for Urban and Rural Areas of NI (Q1 2005 - Q4 2021) and Rural Areas of NI by drive times (Q1 2015 - Q4 2021)',\n",
" 'Table 9: NI Average Sale Prices All Properties Q1 2005 - Q4 2021',\n",
" 'Table 9a: NI Average Sale Prices Detached Properties Q1 2005 - Q4 2021',\n",
" 'Table 9b: NI Average Sale Prices Semi-Detached Properties Q1 2005 - Q4 2021',\n",
" 'Table 9c: NI Average Sale Prices Terrace Properties Q1 2005 - Q4 2021',\n",
" 'Table 9d: NI Average Sale Prices Apartments Q1 2005 - Q4 2021',\n",
" 'Table 10a: Number of Verified Residential Property Sales by Type in Antrim and Newtownabbey Council Q1 2005 - Q4 2021',\n",
" 'Table 10b: Number of Verified Residential Property Sales by Type in Ards and North Down Council Q1 2005 - Q4 2021',\n",
" 'Table 10c: Number of Verified Residential Property Sales by Type in Armagh City, Banbridge and Craigavon Council Q1 2005 - Q4 2021',\n",
" 'Table 10d: Number of Verified Residential Property Sales by Type in Belfast Council Q1 2005 - Q4 2021',\n",
" 'Table 10e: Number of Verified Residential Property Sales by Type in Causeway Coast and Glens Council Q1 2005 - Q4 2021',\n",
" 'Table 10f: Number of Verified Residential Property Sales by Type in Derry City and Strabane Council Q1 2005 - Q4 2021',\n",
" 'Table 10g: Number of Verified Residential Property Sales by Type in Fermanagh and Omagh Council Q1 2005 - Q4 2021',\n",
" 'Table 10h: Number of Verified Residential Property Sales by Type in Lisburn and Castlereagh Council Q1 2005 - Q4 2021',\n",
" 'Table 10i: Number of Verified Residential Property Sales by Type in Mid and East Antrim Council Q1 2005 - Q4 2021',\n",
" 'Table 10j: Number of Verified Residential Property Sales by Type in Mid Ulster Council Q1 2005 - Q4 2021',\n",
" 'Table 10k: Number of Verified Residential Property Sales by Type in Newry, Mourne and Down Council Q1 2005 - Q4 2021']"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[t for t in source_df['Contents']['Title'].values if t.startswith('Table')]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "7f2f84a6-c177-49e5-b88a-f41ab98cc112",
"metadata": {},
"outputs": [],
"source": [
"# Replace 'Figure' with 'Fig' in 'Worksheet Name'\n",
"with pd.option_context('mode.chained_assignment',None):\n",
" source_df['Contents']['Worksheet Name'] = source_df['Contents']['Worksheet Name'].str.replace('Figure','Fig')"
]
},
{
"cell_type": "markdown",
"id": "5b9c9918-e02f-4584-9588-60af306cef39",
"metadata": {},
"source": [
"## Tidy up Data\n",
"\n",
"### General Methodology\n",
"\n",
"Ignore figure data (pretty much completly....)\n",
"\n",
"Tables have more or less the same structure; a header on row 3(1), a year and quarter 'index' (on time series; otherwise categorical index, see Table 2, Table 3).\n",
"\n",
"Some TS tables _also_ have totals subsections so these should be a) validated and b) ignored.\n",
"\n",
"Any columns with no header in row 3(1) should be ignored (usually text notes)\n",
"\n",
"_Operate Sequentially_ (i.e. Table 1, Table 2, Table 2a; don't skip, even if it's tempting)\n",
"\n",
"Use keys from 'Contents' to describe data, but **may be suffixed by the date which could change between data sets!**\n",
"\n",
"There's also some really columns that look like checksums, so if there is an 'NI' column, or a data column that all valid values are '100', delete it.\n",
"\n",
"### Table 1: NI HPI Trends Q1 2005 - Q4 2021\n",
"\n",
"**TODO: Regexy way to get rid of the '\\QX-YYYY -\\QX YYYY' tail**\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "75500327-82fe-4b41-9ee9-9303d72e3cb0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Table 1: NI House Price Index, Standardised Price and Quarterly and Annual Change | \n",
" Unnamed: 1 | \n",
" Unnamed: 2 | \n",
" Unnamed: 3 | \n",
" Unnamed: 4 | \n",
" Unnamed: 5 | \n",
" Unnamed: 6 | \n",
" Unnamed: 7 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Back to contents | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" Year | \n",
" Quarter | \n",
" NI House Price Index | \n",
" NI House Standardised Price | \n",
" Quarterly Change | \n",
" Annual Change | \n",
" NI | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" 2005 | \n",
" Q1 | \n",
" 100.883607 | \n",
" 111920.268199 | \n",
" NaN | \n",
" NaN | \n",
" 100 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 3 | \n",
" NaN | \n",
" Q2 | \n",
" 104.564663 | \n",
" 116004.031639 | \n",
" 0.036488 | \n",
" NaN | \n",
" 100 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 4 | \n",
" NaN | \n",
" Q3 | \n",
" 111.219 | \n",
" 123386.352673 | \n",
" 0.063638 | \n",
" NaN | \n",
" 100 | \n",
" 0.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 89 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 100 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 90 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 100 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 91 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 100 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 92 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 100 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 93 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 100 | \n",
" 0.0 | \n",
"
\n",
" \n",
"
\n",
"
94 rows × 8 columns
\n",
"
"
],
"text/plain": [
" Table 1: NI House Price Index, Standardised Price and Quarterly and Annual Change \\\n",
"0 Back to contents \n",
"1 Year \n",
"2 2005 \n",
"3 NaN \n",
"4 NaN \n",
".. ... \n",
"89 NaN \n",
"90 NaN \n",
"91 NaN \n",
"92 NaN \n",
"93 NaN \n",
"\n",
" Unnamed: 1 Unnamed: 2 Unnamed: 3 \\\n",
"0 NaN NaN NaN \n",
"1 Quarter NI House Price Index NI House Standardised Price \n",
"2 Q1 100.883607 111920.268199 \n",
"3 Q2 104.564663 116004.031639 \n",
"4 Q3 111.219 123386.352673 \n",
".. ... ... ... \n",
"89 NaN NaN NaN \n",
"90 NaN NaN NaN \n",
"91 NaN NaN NaN \n",
"92 NaN NaN NaN \n",
"93 NaN NaN NaN \n",
"\n",
" Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 \n",
"0 NaN NaN NaN NaN \n",
"1 Quarterly Change Annual Change NI NaN \n",
"2 NaN NaN 100 0.0 \n",
"3 0.036488 NaN 100 0.0 \n",
"4 0.063638 NaN 100 0.0 \n",
".. ... ... ... ... \n",
"89 NaN NaN 100 0.0 \n",
"90 NaN NaN 100 0.0 \n",
"91 NaN NaN 100 0.0 \n",
"92 NaN NaN 100 0.0 \n",
"93 NaN NaN 100 0.0 \n",
"\n",
"[94 rows x 8 columns]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"source_df['Table 1']"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "83343de7-a9ad-4916-b7a4-3467ae0b7cab",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" 1 | \n",
" Period | \n",
" Year | \n",
" Quarter | \n",
" NI House Price Index | \n",
" NI House Standardised Price | \n",
" Quarterly Change | \n",
" Annual Change | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2005Q1 | \n",
" 2005 | \n",
" Q1 | \n",
" 100.883607 | \n",
" 111920.268199 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 2005Q2 | \n",
" 2005 | \n",
" Q2 | \n",
" 104.564663 | \n",
" 116004.031639 | \n",
" 0.036488 | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" 2005Q3 | \n",
" 2005 | \n",
" Q3 | \n",
" 111.219000 | \n",
" 123386.352673 | \n",
" 0.063638 | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" 2005Q4 | \n",
" 2005 | \n",
" Q4 | \n",
" 115.083964 | \n",
" 127674.143865 | \n",
" 0.034751 | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" 2006Q1 | \n",
" 2006 | \n",
" Q1 | \n",
" 118.354129 | \n",
" 131302.064422 | \n",
" 0.028415 | \n",
" 0.173175 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 63 | \n",
" 2020Q4 | \n",
" 2020 | \n",
" Q4 | \n",
" 132.931827 | \n",
" 147474.561707 | \n",
" 0.026103 | \n",
" 0.052326 | \n",
"
\n",
" \n",
" 64 | \n",
" 2021Q1 | \n",
" 2021 | \n",
" Q1 | \n",
" 134.382831 | \n",
" 149084.306040 | \n",
" 0.010915 | \n",
" 0.059421 | \n",
"
\n",
" \n",
" 65 | \n",
" 2021Q2 | \n",
" 2021 | \n",
" Q2 | \n",
" 139.105050 | \n",
" 154323.134643 | \n",
" 0.035140 | \n",
" 0.095724 | \n",
"
\n",
" \n",
" 66 | \n",
" 2021Q3 | \n",
" 2021 | \n",
" Q3 | \n",
" 143.346066 | \n",
" 159028.118093 | \n",
" 0.030488 | \n",
" 0.106491 | \n",
"
\n",
" \n",
" 67 | \n",
" 2021Q4 | \n",
" 2021 | \n",
" Q4 | \n",
" 143.456594 | \n",
" 159150.737832 | \n",
" 0.000771 | \n",
" 0.079174 | \n",
"
\n",
" \n",
"
\n",
"
68 rows × 7 columns
\n",
"
"
],
"text/plain": [
"1 Period Year Quarter NI House Price Index NI House Standardised Price \\\n",
"0 2005Q1 2005 Q1 100.883607 111920.268199 \n",
"1 2005Q2 2005 Q2 104.564663 116004.031639 \n",
"2 2005Q3 2005 Q3 111.219000 123386.352673 \n",
"3 2005Q4 2005 Q4 115.083964 127674.143865 \n",
"4 2006Q1 2006 Q1 118.354129 131302.064422 \n",
".. ... ... ... ... ... \n",
"63 2020Q4 2020 Q4 132.931827 147474.561707 \n",
"64 2021Q1 2021 Q1 134.382831 149084.306040 \n",
"65 2021Q2 2021 Q2 139.105050 154323.134643 \n",
"66 2021Q3 2021 Q3 143.346066 159028.118093 \n",
"67 2021Q4 2021 Q4 143.456594 159150.737832 \n",
"\n",
"1 Quarterly Change Annual Change \n",
"0 NaN NaN \n",
"1 0.036488 NaN \n",
"2 0.063638 NaN \n",
"3 0.034751 NaN \n",
"4 0.028415 0.173175 \n",
".. ... ... \n",
"63 0.026103 0.052326 \n",
"64 0.010915 0.059421 \n",
"65 0.035140 0.095724 \n",
"66 0.030488 0.106491 \n",
"67 0.000771 0.079174 \n",
"\n",
"[68 rows x 7 columns]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def basic_cleanup(df:pd.DataFrame, offset=1)->pd.DataFrame:\n",
" df = df.copy() \n",
" # Re-header from row 1 (which was row 3 in excel)\n",
" new_header = df.iloc[offset]\n",
" df = df.iloc[offset+1:]\n",
" df.columns = new_header\n",
" \n",
" # remove 'NaN' trailing columns\n",
" df = df[df.columns[pd.notna(df.columns)]]\n",
" \n",
" # 'NI' is a usually hidden column that appears to be a checksum; \n",
" #if it's all there and all 100, remove it, otherwise, complain.\n",
" # (Note, need to change this 'if' logic to just 'if there's a \n",
" # column with all 100's, but cross that bridge later)\n",
" if 'NI' in df:\n",
" assert df['NI'].all() and df['NI'].mean() == 100, \"Not all values in df['NI'] == 100\"\n",
" df = df.drop('NI', axis=1)\n",
" \n",
" # Strip rows below the first all-nan row, if there is one \n",
" # (Otherwise this truncates the tables as there is no\n",
" # idxmax in the table of all 'false's)\n",
" if any(df.isna().all(axis=1)):\n",
" idx_first_bad_row = df.isna().all(axis=1).idxmax()\n",
" df = df.loc[:idx_first_bad_row-1]\n",
" \n",
" # By Inspection, other tables use 'Sale Year' and 'Sale Quarter'\n",
" if set(df.keys()).issuperset({'Sale Year','Sale Quarter'}):\n",
" df = df.rename(columns = {\n",
" 'Sale Year':'Year',\n",
" 'Sale Quarter': 'Quarter'\n",
" })\n",
" \n",
" # For 'Year','Quarter' indexed pages, there is an implied Year\n",
" # in Q2/4, so fill it downwards\n",
" if set(df.keys()).issuperset({'Year','Quarter'}):\n",
" df['Year'] = df['Year'].astype(float).fillna(method='ffill').astype(int)\n",
" \n",
" # In Pandas we can represent Y/Q combinations as proper datetimes\n",
" #https://stackoverflow.com/questions/53898482/clean-way-to-convert-quarterly-periods-to-datetime-in-pandas \n",
" df.insert(loc=0,\n",
" column='Period',\n",
" value=pd.PeriodIndex(df.apply(lambda r:f'{r.Year}-{r.Quarter}', axis=1), freq='Q')\n",
" )\n",
" \n",
" # reset index, try to fix dtypes, etc, (this should be the last \n",
" # operation before returning!\n",
" df = df.reset_index(drop=True).infer_objects() \n",
" \n",
" return df\n",
"\n",
"df = basic_cleanup(source_df['Table 1'])\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "faab32b5-3bd8-44ca-a5d1-92af1ecafe54",
"metadata": {},
"outputs": [],
"source": [
"dest_df = {\n",
" 'Table 1': basic_cleanup(source_df['Table 1'])\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "11469cc0-fa2e-4b53-935e-df4b7150031e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"32"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len([k for k in source_df.keys() if k.startswith('Table')])"
]
},
{
"cell_type": "markdown",
"id": "0c8b2a5f-a115-4966-8fc2-f7b7021f3934",
"metadata": {},
"source": [
"One down, 31 to go..."
]
},
{
"cell_type": "markdown",
"id": "beb88bdc-efdf-4a8b-8022-2563f8da141f",
"metadata": {},
"source": [
"### Table 2: NI HPI & Standardised Price Statistics by Property Type Q4 2021'"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "2586bcdf-9fa6-4f78-be7f-8784c3d41263",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" 1 | \n",
" Property Type | \n",
" Index\\n(Quarter 4 2021) | \n",
" Percentage Change on Previous Quarter | \n",
" Percentage Change over 12 months | \n",
" Standardised Price\\n(Quarter 4 2021) | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Detached | \n",
" 143.488806 | \n",
" 0.008491 | \n",
" 0.093110 | \n",
" 241131.373512 | \n",
"
\n",
" \n",
" 1 | \n",
" Semi-Detached | \n",
" 140.680694 | \n",
" 0.004211 | \n",
" 0.076953 | \n",
" 153718.543755 | \n",
"
\n",
" \n",
" 2 | \n",
" Terrace | \n",
" 149.564169 | \n",
" -0.009577 | \n",
" 0.078758 | \n",
" 112831.710806 | \n",
"
\n",
" \n",
" 3 | \n",
" Apartment | \n",
" 133.376791 | \n",
" -0.014732 | \n",
" 0.032761 | \n",
" 116554.228620 | \n",
"
\n",
" \n",
" 4 | \n",
" All | \n",
" 143.456594 | \n",
" 0.000771 | \n",
" 0.079174 | \n",
" 159150.737832 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"1 Property Type Index\\n(Quarter 4 2021) \\\n",
"0 Detached 143.488806 \n",
"1 Semi-Detached 140.680694 \n",
"2 Terrace 149.564169 \n",
"3 Apartment 133.376791 \n",
"4 All 143.456594 \n",
"\n",
"1 Percentage Change on Previous Quarter Percentage Change over 12 months \\\n",
"0 0.008491 0.093110 \n",
"1 0.004211 0.076953 \n",
"2 -0.009577 0.078758 \n",
"3 -0.014732 0.032761 \n",
"4 0.000771 0.079174 \n",
"\n",
"1 Standardised Price\\n(Quarter 4 2021) \n",
"0 241131.373512 \n",
"1 153718.543755 \n",
"2 112831.710806 \n",
"3 116554.228620 \n",
"4 159150.737832 "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = basic_cleanup(source_df['Table 2'])\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "c703d087-f9ff-4d62-b7df-2433df7ec0b3",
"metadata": {},
"source": [
"Those '\\n (Quarter 4 2021)' entries are unnecessary, so _for this table_, lets clear them"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "1184f9cb-6ba6-4bd3-b0b7-96f6a1799c4b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Property Type | \n",
" Index | \n",
" Percentage Change on Previous Quarter | \n",
" Percentage Change over 12 months | \n",
" Standardised Price | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Detached | \n",
" 143.488806 | \n",
" 0.008491 | \n",
" 0.093110 | \n",
" 241131.373512 | \n",
"
\n",
" \n",
" 1 | \n",
" Semi-Detached | \n",
" 140.680694 | \n",
" 0.004211 | \n",
" 0.076953 | \n",
" 153718.543755 | \n",
"
\n",
" \n",
" 2 | \n",
" Terrace | \n",
" 149.564169 | \n",
" -0.009577 | \n",
" 0.078758 | \n",
" 112831.710806 | \n",
"
\n",
" \n",
" 3 | \n",
" Apartment | \n",
" 133.376791 | \n",
" -0.014732 | \n",
" 0.032761 | \n",
" 116554.228620 | \n",
"
\n",
" \n",
" 4 | \n",
" All | \n",
" 143.456594 | \n",
" 0.000771 | \n",
" 0.079174 | \n",
" 159150.737832 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Property Type Index Percentage Change on Previous Quarter \\\n",
"0 Detached 143.488806 0.008491 \n",
"1 Semi-Detached 140.680694 0.004211 \n",
"2 Terrace 149.564169 -0.009577 \n",
"3 Apartment 133.376791 -0.014732 \n",
"4 All 143.456594 0.000771 \n",
"\n",
" Percentage Change over 12 months Standardised Price \n",
"0 0.093110 241131.373512 \n",
"1 0.076953 153718.543755 \n",
"2 0.078758 112831.710806 \n",
"3 0.032761 116554.228620 \n",
"4 0.079174 159150.737832 "
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns = [c.split('\\n')[0] for c in df.columns]\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "dff292d5-b7f0-490f-a08e-fb37bab65c68",
"metadata": {},
"outputs": [],
"source": [
"dest_df['Table 2'] = df"
]
},
{
"cell_type": "markdown",
"id": "ec33c515-bbbb-4eb9-adbe-f779b97c7d6a",
"metadata": {},
"source": [
"### Table 2a: NI Detached Property Price Index Q1 2005 - Q4 2021\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "619a2568-4588-4445-bb45-288b097309c5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" 1 | \n",
" Period | \n",
" Year | \n",
" Quarter | \n",
" NI Detached Property Price Index | \n",
" NI Detached Property Standardised Price | \n",
" Quarterly Change | \n",
" Annual Change | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2005Q1 | \n",
" 2005 | \n",
" Q1 | \n",
" 95.465560 | \n",
" 160428.832662 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 2005Q2 | \n",
" 2005 | \n",
" Q2 | \n",
" 100.974498 | \n",
" 169686.542965 | \n",
" 0.057706 | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" 2005Q3 | \n",
" 2005 | \n",
" Q3 | \n",
" 107.526236 | \n",
" 180696.666810 | \n",
" 0.064885 | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" 2005Q4 | \n",
" 2005 | \n",
" Q4 | \n",
" 110.279730 | \n",
" 185323.883533 | \n",
" 0.025608 | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" 2006Q1 | \n",
" 2006 | \n",
" Q1 | \n",
" 112.270506 | \n",
" 188669.361197 | \n",
" 0.018052 | \n",
" 0.176032 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 63 | \n",
" 2020Q4 | \n",
" 2020 | \n",
" Q4 | \n",
" 131.266614 | \n",
" 220592.113069 | \n",
" 0.026393 | \n",
" 0.055357 | \n",
"
\n",
" \n",
" 64 | \n",
" 2021Q1 | \n",
" 2021 | \n",
" Q1 | \n",
" 133.814014 | \n",
" 224872.989982 | \n",
" 0.019406 | \n",
" 0.071429 | \n",
"
\n",
" \n",
" 65 | \n",
" 2021Q2 | \n",
" 2021 | \n",
" Q2 | \n",
" 139.682380 | \n",
" 234734.715703 | \n",
" 0.043855 | \n",
" 0.129844 | \n",
"
\n",
" \n",
" 66 | \n",
" 2021Q3 | \n",
" 2021 | \n",
" Q3 | \n",
" 142.280745 | \n",
" 239101.239764 | \n",
" 0.018602 | \n",
" 0.112515 | \n",
"
\n",
" \n",
" 67 | \n",
" 2021Q4 | \n",
" 2021 | \n",
" Q4 | \n",
" 143.488806 | \n",
" 241131.373512 | \n",
" 0.008491 | \n",
" 0.093110 | \n",
"
\n",
" \n",
"
\n",
"
68 rows × 7 columns
\n",
"
"
],
"text/plain": [
"1 Period Year Quarter NI Detached Property Price Index \\\n",
"0 2005Q1 2005 Q1 95.465560 \n",
"1 2005Q2 2005 Q2 100.974498 \n",
"2 2005Q3 2005 Q3 107.526236 \n",
"3 2005Q4 2005 Q4 110.279730 \n",
"4 2006Q1 2006 Q1 112.270506 \n",
".. ... ... ... ... \n",
"63 2020Q4 2020 Q4 131.266614 \n",
"64 2021Q1 2021 Q1 133.814014 \n",
"65 2021Q2 2021 Q2 139.682380 \n",
"66 2021Q3 2021 Q3 142.280745 \n",
"67 2021Q4 2021 Q4 143.488806 \n",
"\n",
"1 NI Detached Property Standardised Price Quarterly Change Annual Change \n",
"0 160428.832662 NaN NaN \n",
"1 169686.542965 0.057706 NaN \n",
"2 180696.666810 0.064885 NaN \n",
"3 185323.883533 0.025608 NaN \n",
"4 188669.361197 0.018052 0.176032 \n",
".. ... ... ... \n",
"63 220592.113069 0.026393 0.055357 \n",
"64 224872.989982 0.019406 0.071429 \n",
"65 234734.715703 0.043855 0.129844 \n",
"66 239101.239764 0.018602 0.112515 \n",
"67 241131.373512 0.008491 0.093110 \n",
"\n",
"[68 rows x 7 columns]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = basic_cleanup(source_df['Table 2a'])\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "d2e247f5-e259-4e29-a72e-40c7bbf9cff9",
"metadata": {},
"source": [
"### Table 2x: NI XXX Property Price Index Q1 2005 - Q4 2021\n",
"\n",
"This table structure is consistent against the rest of the Table 2x cohort; mapping to the Property Types listed in Table 2.\n",
"\n",
"For the time being, we can ignore these, but this will probably become a pain later on..."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "5219443b-9f78-4ae5-863b-a4ab4a42b61d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 Detached\n",
"1 Semi-Detached\n",
"2 Terrace\n",
"3 Apartment\n",
"4 All\n",
"Name: Property Type, dtype: object"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dest_df['Table 2']['Property Type']"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "8e7a3dad-9912-4016-9f2b-1bed2aa44e98",
"metadata": {},
"outputs": [],
"source": [
"import re\n",
"\n",
"table2s = re.compile('Table 2[a-z]')\n",
"assert table2s.match('Table 2') is None, 'Table 2 is matching itself!'\n",
"assert table2s.match('Table 20') is None, 'Table 2 is greedy!'\n",
"assert table2s.match('Table 2z') is not None, 'Table 2 is matching incorrectly!'"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "4933aef8-d4c6-4833-b301-d1905b94e76e",
"metadata": {},
"outputs": [],
"source": [
"table2s = re.compile('Table 2[a-z]')\n",
"for table in source_df:\n",
" if table2s.match(table):\n",
" dest_df[table] = basic_cleanup(source_df[table])"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "d5f2ba02-1b03-4e6d-93e4-895cc00fb294",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(6, 26)"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(dest_df), len([k for k in source_df.keys() if k.startswith('Table') and k not in dest_df])"
]
},
{
"cell_type": "markdown",
"id": "2c45a8b2-1782-41ea-a802-0cd6fd76a168",
"metadata": {},
"source": [
"6 down, 26 to go. \n",
"\n",
"### Table 3: NI HPI & Standardised Price Statistics by New/Existing Resold Dwelling Type Q4 2021\n",
"\n",
"These appear to be a similar structure of the Table 2's... hopefully"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "086ec10d-5303-43bb-8575-073b1eab7bde",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Property Type | \n",
" Index | \n",
" Percentage Change on Previous Quarter | \n",
" Percentage Change over 12 months | \n",
" Standardised Price | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" New | \n",
" 141.769973 | \n",
" 0.024877 | \n",
" 0.072609 | \n",
" 185966.524090 | \n",
"
\n",
" \n",
" 1 | \n",
" Existing Resold | \n",
" 143.518977 | \n",
" -0.004918 | \n",
" 0.080771 | \n",
" 152275.828046 | \n",
"
\n",
" \n",
" 2 | \n",
" All | \n",
" 143.456594 | \n",
" 0.000771 | \n",
" 0.079174 | \n",
" 159150.737832 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Property Type Index Percentage Change on Previous Quarter \\\n",
"0 New 141.769973 0.024877 \n",
"1 Existing Resold 143.518977 -0.004918 \n",
"2 All 143.456594 0.000771 \n",
"\n",
" Percentage Change over 12 months Standardised Price \n",
"0 0.072609 185966.524090 \n",
"1 0.080771 152275.828046 \n",
"2 0.079174 159150.737832 "
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = basic_cleanup(source_df['Table 3'])\n",
"df.columns = [c.split('\\n')[0] for c in df.columns] # Stolen from Table 2 Treatment\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "12219ff6-bc08-462b-8108-2608674bc33d",
"metadata": {},
"outputs": [],
"source": [
"dest_df['Table 3'] = df"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "02ee1cd8-a519-4257-8bc4-a08e8d46fa9b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" 1 | \n",
" Period | \n",
" Year | \n",
" Quarter | \n",
" NI New Dwellings Price Index | \n",
" NI New Dwellings Standardised Price | \n",
" Quarterly Change | \n",
" Annual Change | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2005Q1 | \n",
" 2005 | \n",
" Q1 | \n",
" 95.804706 | \n",
" 125671.662611 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 2005Q2 | \n",
" 2005 | \n",
" Q2 | \n",
" 101.229223 | \n",
" 132787.263460 | \n",
" 0.056621 | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" 2005Q3 | \n",
" 2005 | \n",
" Q3 | \n",
" 106.243580 | \n",
" 139364.837967 | \n",
" 0.049535 | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" 2005Q4 | \n",
" 2005 | \n",
" Q4 | \n",
" 110.118105 | \n",
" 144447.239874 | \n",
" 0.036468 | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" 2006Q1 | \n",
" 2006 | \n",
" Q1 | \n",
" 113.624410 | \n",
" 149046.629634 | \n",
" 0.031841 | \n",
" 0.186000 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 63 | \n",
" 2020Q4 | \n",
" 2020 | \n",
" Q4 | \n",
" 132.173052 | \n",
" 173377.779440 | \n",
" 0.004103 | \n",
" 0.036125 | \n",
"
\n",
" \n",
" 64 | \n",
" 2021Q1 | \n",
" 2021 | \n",
" Q1 | \n",
" 133.772562 | \n",
" 175475.933612 | \n",
" 0.012102 | \n",
" 0.027916 | \n",
"
\n",
" \n",
" 65 | \n",
" 2021Q2 | \n",
" 2021 | \n",
" Q2 | \n",
" 136.969311 | \n",
" 179669.264190 | \n",
" 0.023897 | \n",
" 0.046474 | \n",
"
\n",
" \n",
" 66 | \n",
" 2021Q3 | \n",
" 2021 | \n",
" Q3 | \n",
" 138.328776 | \n",
" 181452.540106 | \n",
" 0.009925 | \n",
" 0.050867 | \n",
"
\n",
" \n",
" 67 | \n",
" 2021Q4 | \n",
" 2021 | \n",
" Q4 | \n",
" 141.769973 | \n",
" 185966.524090 | \n",
" 0.024877 | \n",
" 0.072609 | \n",
"
\n",
" \n",
"
\n",
"
68 rows × 7 columns
\n",
"
"
],
"text/plain": [
"1 Period Year Quarter NI New Dwellings Price Index \\\n",
"0 2005Q1 2005 Q1 95.804706 \n",
"1 2005Q2 2005 Q2 101.229223 \n",
"2 2005Q3 2005 Q3 106.243580 \n",
"3 2005Q4 2005 Q4 110.118105 \n",
"4 2006Q1 2006 Q1 113.624410 \n",
".. ... ... ... ... \n",
"63 2020Q4 2020 Q4 132.173052 \n",
"64 2021Q1 2021 Q1 133.772562 \n",
"65 2021Q2 2021 Q2 136.969311 \n",
"66 2021Q3 2021 Q3 138.328776 \n",
"67 2021Q4 2021 Q4 141.769973 \n",
"\n",
"1 NI New Dwellings Standardised Price Quarterly Change Annual Change \n",
"0 125671.662611 NaN NaN \n",
"1 132787.263460 0.056621 NaN \n",
"2 139364.837967 0.049535 NaN \n",
"3 144447.239874 0.036468 NaN \n",
"4 149046.629634 0.031841 0.186000 \n",
".. ... ... ... \n",
"63 173377.779440 0.004103 0.036125 \n",
"64 175475.933612 0.012102 0.027916 \n",
"65 179669.264190 0.023897 0.046474 \n",
"66 181452.540106 0.009925 0.050867 \n",
"67 185966.524090 0.024877 0.072609 \n",
"\n",
"[68 rows x 7 columns]"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = basic_cleanup(source_df['Table 3a'])\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "e8512fc1-67b5-43ec-8cc2-49b02c1f4752",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(10, 22)"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table3s = re.compile('Table 3[a-z]')\n",
"for table in source_df:\n",
" if table3s.match(table):\n",
" dest_df[table] = basic_cleanup(source_df[table])\n",
"len(dest_df), len([k for k in source_df.keys() if k.startswith('Table') and k not in dest_df])"
]
},
{
"cell_type": "markdown",
"id": "7dfe9510-9de1-470c-97b6-e982de44b774",
"metadata": {},
"source": [
"### Table 4: Number of Verified Residential Property Sales Q1 2005 - Q4 2021\n",
"\n",
"Table 4 is not looking great "
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "0373b3d4-9b6c-4d55-aab1-6fec84bb89f5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Table 4: Number of Verified Residential Property Sales | \n",
" Unnamed: 1 | \n",
" Unnamed: 2 | \n",
" Unnamed: 3 | \n",
" Unnamed: 4 | \n",
" Unnamed: 5 | \n",
" Unnamed: 6 | \n",
" Unnamed: 7 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Verified Sales = Sales matched to a property i... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" Please note figures for the 2 most recent quar... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" Back to contents | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" Sale Year | \n",
" Sale Quarter | \n",
" Detached | \n",
" Semi-Detached | \n",
" Terrace | \n",
" Apartment | \n",
" Total | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" 2005\\n | \n",
" Quarter 1 | \n",
" 809 | \n",
" 894 | \n",
" 1035 | \n",
" 198 | \n",
" 2936 | \n",
" NaN | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 84 | \n",
" 2021 | \n",
" Quarter 1 | \n",
" 2509 | \n",
" 2477 | \n",
" 1962 | \n",
" 561 | \n",
" 7509 | \n",
" NaN | \n",
"
\n",
" \n",
" 85 | \n",
" NaN | \n",
" Quarter 2 | \n",
" 2668 | \n",
" 2613 | \n",
" 2056 | \n",
" 604 | \n",
" 7941 | \n",
" NaN | \n",
"
\n",
" \n",
" 86 | \n",
" NaN | \n",
" Quarter 3 | \n",
" 2519 | \n",
" 2797 | \n",
" 2220 | \n",
" 633 | \n",
" 8169 | \n",
" Please note this figure is provisional and wil... | \n",
"
\n",
" \n",
" 87 | \n",
" NaN | \n",
" Quarter 4 | \n",
" 1478 | \n",
" 2100 | \n",
" 2057 | \n",
" 515 | \n",
" 6150 | \n",
" and new dwellings sold in this quarter being a... | \n",
"
\n",
" \n",
" 88 | \n",
" NaN | \n",
" 2021 Total | \n",
" 9174 | \n",
" 9987 | \n",
" 8295 | \n",
" 2313 | \n",
" 29769 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
89 rows × 8 columns
\n",
"
"
],
"text/plain": [
" Table 4: Number of Verified Residential Property Sales Unnamed: 1 \\\n",
"0 Verified Sales = Sales matched to a property i... NaN \n",
"1 Please note figures for the 2 most recent quar... NaN \n",
"2 Back to contents NaN \n",
"3 Sale Year Sale Quarter \n",
"4 2005\\n Quarter 1 \n",
".. ... ... \n",
"84 2021 Quarter 1 \n",
"85 NaN Quarter 2 \n",
"86 NaN Quarter 3 \n",
"87 NaN Quarter 4 \n",
"88 NaN 2021 Total \n",
"\n",
" Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 \\\n",
"0 NaN NaN NaN NaN NaN \n",
"1 NaN NaN NaN NaN NaN \n",
"2 NaN NaN NaN NaN NaN \n",
"3 Detached Semi-Detached Terrace Apartment Total \n",
"4 809 894 1035 198 2936 \n",
".. ... ... ... ... ... \n",
"84 2509 2477 1962 561 7509 \n",
"85 2668 2613 2056 604 7941 \n",
"86 2519 2797 2220 633 8169 \n",
"87 1478 2100 2057 515 6150 \n",
"88 9174 9987 8295 2313 29769 \n",
"\n",
" Unnamed: 7 \n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN \n",
".. ... \n",
"84 NaN \n",
"85 NaN \n",
"86 Please note this figure is provisional and wil... \n",
"87 and new dwellings sold in this quarter being a... \n",
"88 NaN \n",
"\n",
"[89 rows x 8 columns]"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = source_df['Table 4']\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "a4cf8323-d247-42fe-b5cc-48de150a01c2",
"metadata": {},
"source": [
"Of note; new offset for the header row at index 3 instead of index 1, due to lots of fluff at the start that is probably not going to be consistent between reports so that will almost certainly mess up my day in a few months. \n",
"\n",
"Also, **Quarter dates** have now been shifted into 'Quarter 1' instead of 'Q1', which ... meh 🤷♂️. More Egrigiously, it looks like **'\\n' has leaked into some Sales Year values**. Funtimes. \n",
"\n",
"Finally, and possibly most annoying, the introduction of **partial total lines** is going to throw things off, and this isn't a validation study, to stuff-em\n",
"\n",
"In an effort not to over-complicate `basic_cleanup`, we can try and clean these table specific issues first; "
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "fee88d14-a2b7-4405-838d-ac503b3ec88e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Table 4: Number of Verified Residential Property Sales | \n",
" Unnamed: 1 | \n",
" Unnamed: 2 | \n",
" Unnamed: 3 | \n",
" Unnamed: 4 | \n",
" Unnamed: 5 | \n",
" Unnamed: 6 | \n",
" Unnamed: 7 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Verified Sales = Sales matched to a property i... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" Please note figures for the 2 most recent quar... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" Back to contents | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" Sale Year | \n",
" Sale Quarter | \n",
" Detached | \n",
" Semi-Detached | \n",
" Terrace | \n",
" Apartment | \n",
" Total | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" 2005\\n | \n",
" Q1 | \n",
" 809 | \n",
" 894 | \n",
" 1035 | \n",
" 198 | \n",
" 2936 | \n",
" NaN | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 84 | \n",
" 2021 | \n",
" Q1 | \n",
" 2509 | \n",
" 2477 | \n",
" 1962 | \n",
" 561 | \n",
" 7509 | \n",
" NaN | \n",
"
\n",
" \n",
" 85 | \n",
" NaN | \n",
" Q2 | \n",
" 2668 | \n",
" 2613 | \n",
" 2056 | \n",
" 604 | \n",
" 7941 | \n",
" NaN | \n",
"
\n",
" \n",
" 86 | \n",
" NaN | \n",
" Q3 | \n",
" 2519 | \n",
" 2797 | \n",
" 2220 | \n",
" 633 | \n",
" 8169 | \n",
" Please note this figure is provisional and wil... | \n",
"
\n",
" \n",
" 87 | \n",
" NaN | \n",
" Q4 | \n",
" 1478 | \n",
" 2100 | \n",
" 2057 | \n",
" 515 | \n",
" 6150 | \n",
" and new dwellings sold in this quarter being a... | \n",
"
\n",
" \n",
" 88 | \n",
" NaN | \n",
" 2021 Total | \n",
" 9174 | \n",
" 9987 | \n",
" 8295 | \n",
" 2313 | \n",
" 29769 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
89 rows × 8 columns
\n",
"
"
],
"text/plain": [
" Table 4: Number of Verified Residential Property Sales Unnamed: 1 \\\n",
"0 Verified Sales = Sales matched to a property i... NaN \n",
"1 Please note figures for the 2 most recent quar... NaN \n",
"2 Back to contents NaN \n",
"3 Sale Year Sale Quarter \n",
"4 2005\\n Q1 \n",
".. ... ... \n",
"84 2021 Q1 \n",
"85 NaN Q2 \n",
"86 NaN Q3 \n",
"87 NaN Q4 \n",
"88 NaN 2021 Total \n",
"\n",
" Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 \\\n",
"0 NaN NaN NaN NaN NaN \n",
"1 NaN NaN NaN NaN NaN \n",
"2 NaN NaN NaN NaN NaN \n",
"3 Detached Semi-Detached Terrace Apartment Total \n",
"4 809 894 1035 198 2936 \n",
".. ... ... ... ... ... \n",
"84 2509 2477 1962 561 7509 \n",
"85 2668 2613 2056 604 7941 \n",
"86 2519 2797 2220 633 8169 \n",
"87 1478 2100 2057 515 6150 \n",
"88 9174 9987 8295 2313 29769 \n",
"\n",
" Unnamed: 7 \n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN \n",
".. ... \n",
"84 NaN \n",
"85 NaN \n",
"86 Please note this figure is provisional and wil... \n",
"87 and new dwellings sold in this quarter being a... \n",
"88 NaN \n",
"\n",
"[89 rows x 8 columns]"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[:,1]=df.iloc[:,1].str.replace('Quarter ([1-4])',r'Q\\1', regex=True)\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "a2ff2bd2-0c96-4edc-9a95-d68eff6d63a6",
"metadata": {},
"outputs": [],
"source": [
"df=df[~df.iloc[:,1].str.contains('Total').fillna(False)]"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "7ae73897-7b8d-4574-bce7-ee14c570a1ac",
"metadata": {},
"outputs": [],
"source": [
"# Lose the year new-lines (needs astype because non str lines are\n",
"# correctly inferred to be ints, so .str methods nan-out\n",
"with pd.option_context('mode.chained_assignment',None):\n",
" df.iloc[:,0]=df.iloc[:,0].astype(str).str.replace('\\n','') "
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "5def352d-2fb6-427e-828f-9e771c6e5269",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Table 4: Number of Verified Residential Property Sales | \n",
" Unnamed: 1 | \n",
" Unnamed: 2 | \n",
" Unnamed: 3 | \n",
" Unnamed: 4 | \n",
" Unnamed: 5 | \n",
" Unnamed: 6 | \n",
" Unnamed: 7 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Verified Sales = Sales matched to a property i... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" Please note figures for the 2 most recent quar... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" Back to contents | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" Sale Year | \n",
" Sale Quarter | \n",
" Detached | \n",
" Semi-Detached | \n",
" Terrace | \n",
" Apartment | \n",
" Total | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" 2005 | \n",
" Q1 | \n",
" 809 | \n",
" 894 | \n",
" 1035 | \n",
" 198 | \n",
" 2936 | \n",
" NaN | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 82 | \n",
" nan | \n",
" Q4 | \n",
" 2808 | \n",
" 2944 | \n",
" 2170 | \n",
" 555 | \n",
" 8477 | \n",
" NaN | \n",
"
\n",
" \n",
" 84 | \n",
" 2021 | \n",
" Q1 | \n",
" 2509 | \n",
" 2477 | \n",
" 1962 | \n",
" 561 | \n",
" 7509 | \n",
" NaN | \n",
"
\n",
" \n",
" 85 | \n",
" nan | \n",
" Q2 | \n",
" 2668 | \n",
" 2613 | \n",
" 2056 | \n",
" 604 | \n",
" 7941 | \n",
" NaN | \n",
"
\n",
" \n",
" 86 | \n",
" nan | \n",
" Q3 | \n",
" 2519 | \n",
" 2797 | \n",
" 2220 | \n",
" 633 | \n",
" 8169 | \n",
" Please note this figure is provisional and wil... | \n",
"
\n",
" \n",
" 87 | \n",
" nan | \n",
" Q4 | \n",
" 1478 | \n",
" 2100 | \n",
" 2057 | \n",
" 515 | \n",
" 6150 | \n",
" and new dwellings sold in this quarter being a... | \n",
"
\n",
" \n",
"
\n",
"
72 rows × 8 columns
\n",
"
"
],
"text/plain": [
" Table 4: Number of Verified Residential Property Sales Unnamed: 1 \\\n",
"0 Verified Sales = Sales matched to a property i... NaN \n",
"1 Please note figures for the 2 most recent quar... NaN \n",
"2 Back to contents NaN \n",
"3 Sale Year Sale Quarter \n",
"4 2005 Q1 \n",
".. ... ... \n",
"82 nan Q4 \n",
"84 2021 Q1 \n",
"85 nan Q2 \n",
"86 nan Q3 \n",
"87 nan Q4 \n",
"\n",
" Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 \\\n",
"0 NaN NaN NaN NaN NaN \n",
"1 NaN NaN NaN NaN NaN \n",
"2 NaN NaN NaN NaN NaN \n",
"3 Detached Semi-Detached Terrace Apartment Total \n",
"4 809 894 1035 198 2936 \n",
".. ... ... ... ... ... \n",
"82 2808 2944 2170 555 8477 \n",
"84 2509 2477 1962 561 7509 \n",
"85 2668 2613 2056 604 7941 \n",
"86 2519 2797 2220 633 8169 \n",
"87 1478 2100 2057 515 6150 \n",
"\n",
" Unnamed: 7 \n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN \n",
".. ... \n",
"82 NaN \n",
"84 NaN \n",
"85 NaN \n",
"86 Please note this figure is provisional and wil... \n",
"87 and new dwellings sold in this quarter being a... \n",
"\n",
"[72 rows x 8 columns]"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "a88ccc09-261f-43f0-80bd-f8cbafd8848d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" 3 | \n",
" Period | \n",
" Year | \n",
" Quarter | \n",
" Detached | \n",
" Semi-Detached | \n",
" Terrace | \n",
" Apartment | \n",
" Total | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2005Q1 | \n",
" 2005 | \n",
" Q1 | \n",
" 809 | \n",
" 894 | \n",
" 1035 | \n",
" 198 | \n",
" 2936 | \n",
"
\n",
" \n",
" 1 | \n",
" 2005Q2 | \n",
" 2005 | \n",
" Q2 | \n",
" 2208 | \n",
" 2474 | \n",
" 2808 | \n",
" 483 | \n",
" 7973 | \n",
"
\n",
" \n",
" 2 | \n",
" 2005Q3 | \n",
" 2005 | \n",
" Q3 | \n",
" 2297 | \n",
" 2655 | \n",
" 2952 | \n",
" 539 | \n",
" 8443 | \n",
"
\n",
" \n",
" 3 | \n",
" 2005Q4 | \n",
" 2005 | \n",
" Q4 | \n",
" 2498 | \n",
" 3003 | \n",
" 3492 | \n",
" 631 | \n",
" 9624 | \n",
"
\n",
" \n",
" 4 | \n",
" 2006Q1 | \n",
" 2006 | \n",
" Q1 | \n",
" 2185 | \n",
" 2650 | \n",
" 3158 | \n",
" 594 | \n",
" 8587 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 63 | \n",
" 2020Q4 | \n",
" 2020 | \n",
" Q4 | \n",
" 2808 | \n",
" 2944 | \n",
" 2170 | \n",
" 555 | \n",
" 8477 | \n",
"
\n",
" \n",
" 64 | \n",
" 2021Q1 | \n",
" 2021 | \n",
" Q1 | \n",
" 2509 | \n",
" 2477 | \n",
" 1962 | \n",
" 561 | \n",
" 7509 | \n",
"
\n",
" \n",
" 65 | \n",
" 2021Q2 | \n",
" 2021 | \n",
" Q2 | \n",
" 2668 | \n",
" 2613 | \n",
" 2056 | \n",
" 604 | \n",
" 7941 | \n",
"
\n",
" \n",
" 66 | \n",
" 2021Q3 | \n",
" 2021 | \n",
" Q3 | \n",
" 2519 | \n",
" 2797 | \n",
" 2220 | \n",
" 633 | \n",
" 8169 | \n",
"
\n",
" \n",
" 67 | \n",
" 2021Q4 | \n",
" 2021 | \n",
" Q4 | \n",
" 1478 | \n",
" 2100 | \n",
" 2057 | \n",
" 515 | \n",
" 6150 | \n",
"
\n",
" \n",
"
\n",
"
68 rows × 8 columns
\n",
"
"
],
"text/plain": [
"3 Period Year Quarter Detached Semi-Detached Terrace Apartment Total\n",
"0 2005Q1 2005 Q1 809 894 1035 198 2936\n",
"1 2005Q2 2005 Q2 2208 2474 2808 483 7973\n",
"2 2005Q3 2005 Q3 2297 2655 2952 539 8443\n",
"3 2005Q4 2005 Q4 2498 3003 3492 631 9624\n",
"4 2006Q1 2006 Q1 2185 2650 3158 594 8587\n",
".. ... ... ... ... ... ... ... ...\n",
"63 2020Q4 2020 Q4 2808 2944 2170 555 8477\n",
"64 2021Q1 2021 Q1 2509 2477 1962 561 7509\n",
"65 2021Q2 2021 Q2 2668 2613 2056 604 7941\n",
"66 2021Q3 2021 Q3 2519 2797 2220 633 8169\n",
"67 2021Q4 2021 Q4 1478 2100 2057 515 6150\n",
"\n",
"[68 rows x 8 columns]"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"basic_cleanup(df, offset=3)"
]
},
{
"cell_type": "markdown",
"id": "b71342c8-bccb-4b9c-8e74-210d9b6e84a3",
"metadata": {},
"source": [
"Thats awkward enough to get it's own function...\n"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "081cec92-bab6-4d7e-a6d7-db8d640a6def",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" 3 | \n",
" Period | \n",
" Year | \n",
" Quarter | \n",
" Detached | \n",
" Semi-Detached | \n",
" Terrace | \n",
" Apartment | \n",
" Total | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2005Q1 | \n",
" 2005 | \n",
" Q1 | \n",
" 809 | \n",
" 894 | \n",
" 1035 | \n",
" 198 | \n",
" 2936 | \n",
"
\n",
" \n",
" 1 | \n",
" 2005Q2 | \n",
" 2005 | \n",
" Q2 | \n",
" 2208 | \n",
" 2474 | \n",
" 2808 | \n",
" 483 | \n",
" 7973 | \n",
"
\n",
" \n",
" 2 | \n",
" 2005Q3 | \n",
" 2005 | \n",
" Q3 | \n",
" 2297 | \n",
" 2655 | \n",
" 2952 | \n",
" 539 | \n",
" 8443 | \n",
"
\n",
" \n",
" 3 | \n",
" 2005Q4 | \n",
" 2005 | \n",
" Q4 | \n",
" 2498 | \n",
" 3003 | \n",
" 3492 | \n",
" 631 | \n",
" 9624 | \n",
"
\n",
" \n",
" 4 | \n",
" 2006Q1 | \n",
" 2006 | \n",
" Q1 | \n",
" 2185 | \n",
" 2650 | \n",
" 3158 | \n",
" 594 | \n",
" 8587 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 63 | \n",
" 2020Q4 | \n",
" 2020 | \n",
" Q4 | \n",
" 2808 | \n",
" 2944 | \n",
" 2170 | \n",
" 555 | \n",
" 8477 | \n",
"
\n",
" \n",
" 64 | \n",
" 2021Q1 | \n",
" 2021 | \n",
" Q1 | \n",
" 2509 | \n",
" 2477 | \n",
" 1962 | \n",
" 561 | \n",
" 7509 | \n",
"
\n",
" \n",
" 65 | \n",
" 2021Q2 | \n",
" 2021 | \n",
" Q2 | \n",
" 2668 | \n",
" 2613 | \n",
" 2056 | \n",
" 604 | \n",
" 7941 | \n",
"
\n",
" \n",
" 66 | \n",
" 2021Q3 | \n",
" 2021 | \n",
" Q3 | \n",
" 2519 | \n",
" 2797 | \n",
" 2220 | \n",
" 633 | \n",
" 8169 | \n",
"
\n",
" \n",
" 67 | \n",
" 2021Q4 | \n",
" 2021 | \n",
" Q4 | \n",
" 1478 | \n",
" 2100 | \n",
" 2057 | \n",
" 515 | \n",
" 6150 | \n",
"
\n",
" \n",
"
\n",
"
68 rows × 8 columns
\n",
"
"
],
"text/plain": [
"3 Period Year Quarter Detached Semi-Detached Terrace Apartment Total\n",
"0 2005Q1 2005 Q1 809 894 1035 198 2936\n",
"1 2005Q2 2005 Q2 2208 2474 2808 483 7973\n",
"2 2005Q3 2005 Q3 2297 2655 2952 539 8443\n",
"3 2005Q4 2005 Q4 2498 3003 3492 631 9624\n",
"4 2006Q1 2006 Q1 2185 2650 3158 594 8587\n",
".. ... ... ... ... ... ... ... ...\n",
"63 2020Q4 2020 Q4 2808 2944 2170 555 8477\n",
"64 2021Q1 2021 Q1 2509 2477 1962 561 7509\n",
"65 2021Q2 2021 Q2 2668 2613 2056 604 7941\n",
"66 2021Q3 2021 Q3 2519 2797 2220 633 8169\n",
"67 2021Q4 2021 Q4 1478 2100 2057 515 6150\n",
"\n",
"[68 rows x 8 columns]"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def cleanup_table_4(df):\n",
" \"\"\"\n",
" Table 4: Number of Verified Residential Property Sales\n",
" * Regex 'Quarter X' to 'QX' in future 'Sales Quarter' column\n",
" * Drop Year Total rows\n",
" * Clear any Newlines from the future 'Sales Year' column\n",
" * call `basic_cleanup` with offset=3\n",
" \"\"\"\n",
" df.iloc[:,1]=df.iloc[:,1].str.replace('Quarter ([1-4])',r'Q\\1', regex=True)\n",
" df=df[~df.iloc[:,1].str.contains('Total').fillna(False)]\n",
" # Lose the year new-lines (needs astype because non str lines are\n",
" # correctly inferred to be ints, so .str methods nan-out\n",
" with pd.option_context('mode.chained_assignment',None):\n",
" df.iloc[:,0]=df.iloc[:,0].astype(str).str.replace('\\n','') \n",
" return basic_cleanup(df, offset=3)\n",
"\n",
"cleanup_table_4(source_df['Table 4'].copy())"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "f1bc6437-16c5-4619-b077-3ee809d9687a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(11, 21)"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dest_df['Table 4'] = cleanup_table_4(source_df['Table 4'])\n",
"len(dest_df), len([k for k in source_df.keys() if k.startswith('Table') and k not in dest_df])"
]
},
{
"cell_type": "markdown",
"id": "fb04e8db-1e18-417d-8b14-649bcac94283",
"metadata": {},
"source": [
"### Table 5: HPI & Standardised Price for each Local Government District in NI\n",
"\n",
"This _nearly works_ but structurally requires a multi-index column to make sense...."
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "47a58c84-898f-47cf-b212-b26a24ec6083",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" 1 | \n",
" Period | \n",
" Year | \n",
" Quarter | \n",
" Antrim and Newtownabbey HPI | \n",
" Antrim and Newtownabbey Standardised Price | \n",
" Ards and North Down HPI | \n",
" Ards and North Down Standardised Price | \n",
" Armagh City, Banbridge and Craigavon HPI | \n",
" Armagh City, Banbridge and Craigavon Standardised Price | \n",
" Belfast HPI | \n",
" ... | \n",
" Fermanagh and Omagh HPI | \n",
" Fermanagh and Omagh Standardised Price | \n",
" Lisburn and Castlereagh HPI | \n",
" Lisburn and Castlereagh Standardised Price | \n",
" Mid and East Antrim HPI | \n",
" Mid and East Antrim Standardised Price | \n",
" Mid Ulster Standardised HPI | \n",
" Mid Ulster Standardised Price | \n",
" Newry, Mourne and Down HPI | \n",
" Newry, Mourne and Down Standardised Price | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2005Q1 | \n",
" 2005 | \n",
" Q1 | \n",
" 99.903277 | \n",
" 114851.528270 | \n",
" 97.150602 | \n",
" 130398.569667 | \n",
" 102.245597 | \n",
" 100785.145986 | \n",
" 99.839849 | \n",
" ... | \n",
" 109.429237 | \n",
" 104874.980231 | \n",
" 95.958322 | \n",
" 128828.327513 | \n",
" 102.246427 | \n",
" 105865.408901 | \n",
" 102.714778 | \n",
" 114882.211239 | \n",
" 100.810773 | \n",
" 113420.880186 | \n",
"
\n",
" \n",
" 1 | \n",
" 2005Q2 | \n",
" 2005 | \n",
" Q2 | \n",
" 99.723509 | \n",
" 114644.862732 | \n",
" 100.794472 | \n",
" 137133.037807 | \n",
" 106.325843 | \n",
" 104807.109982 | \n",
" 100.589870 | \n",
" ... | \n",
" 117.239850 | \n",
" 112360.529330 | \n",
" 100.164437 | \n",
" 134475.225477 | \n",
" 104.443325 | \n",
" 108140.065924 | \n",
" 110.386311 | \n",
" 123462.501283 | \n",
" 111.965743 | \n",
" 125971.191415 | \n",
"
\n",
" \n",
" 2 | \n",
" 2005Q3 | \n",
" 2005 | \n",
" Q3 | \n",
" 107.940849 | \n",
" 124091.740608 | \n",
" 102.167971 | \n",
" 137133.037807 | \n",
" 110.006212 | \n",
" 108434.910333 | \n",
" 109.614861 | \n",
" ... | \n",
" 125.900145 | \n",
" 120660.397585 | \n",
" 106.757895 | \n",
" 143327.237126 | \n",
" 112.748278 | \n",
" 116738.970434 | \n",
" 117.595723 | \n",
" 131525.929577 | \n",
" 117.235685 | \n",
" 131900.333698 | \n",
"
\n",
" \n",
" 3 | \n",
" 2005Q4 | \n",
" 2005 | \n",
" Q4 | \n",
" 111.934696 | \n",
" 128683.175719 | \n",
" 106.396379 | \n",
" 142808.538807 | \n",
" 116.073031 | \n",
" 114415.072260 | \n",
" 110.728237 | \n",
" ... | \n",
" 130.781315 | \n",
" 125338.422216 | \n",
" 111.307116 | \n",
" 149434.769200 | \n",
" 114.584090 | \n",
" 118639.759900 | \n",
" 121.851999 | \n",
" 136286.397473 | \n",
" 123.628047 | \n",
" 139092.296651 | \n",
"
\n",
" \n",
" 4 | \n",
" 2006Q1 | \n",
" 2006 | \n",
" Q1 | \n",
" 113.494351 | \n",
" 130476.197845 | \n",
" 109.206160 | \n",
" 146579.915492 | \n",
" 121.831058 | \n",
" 120090.852733 | \n",
" 112.326582 | \n",
" ... | \n",
" 135.555749 | \n",
" 129914.152078 | \n",
" 110.539212 | \n",
" 148403.823796 | \n",
" 115.878975 | \n",
" 119980.477260 | \n",
" 129.544702 | \n",
" 144890.365875 | \n",
" 127.449613 | \n",
" 143391.890242 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 63 | \n",
" 2020Q4 | \n",
" 2020 | \n",
" Q4 | \n",
" 131.045293 | \n",
" 150653.238745 | \n",
" 123.824862 | \n",
" 166201.593253 | \n",
" 130.857866 | \n",
" 128988.723586 | \n",
" 133.525177 | \n",
" ... | \n",
" 142.381814 | \n",
" 136456.127817 | \n",
" 129.797418 | \n",
" 174258.823716 | \n",
" 130.813697 | \n",
" 135443.809729 | \n",
" 124.617456 | \n",
" 139379.446212 | \n",
" 138.815696 | \n",
" 156179.721555 | \n",
"
\n",
" \n",
" 64 | \n",
" 2021Q1 | \n",
" 2021 | \n",
" Q1 | \n",
" 133.481101 | \n",
" 153453.510344 | \n",
" 128.398202 | \n",
" 172340.072904 | \n",
" 130.382658 | \n",
" 128520.303209 | \n",
" 135.257679 | \n",
" ... | \n",
" 141.464114 | \n",
" 135576.621629 | \n",
" 131.166305 | \n",
" 176096.615474 | \n",
" 131.268719 | \n",
" 135914.936888 | \n",
" 127.657615 | \n",
" 142779.737045 | \n",
" 138.481347 | \n",
" 155803.549899 | \n",
"
\n",
" \n",
" 65 | \n",
" 2021Q2 | \n",
" 2021 | \n",
" Q2 | \n",
" 137.827568 | \n",
" 158450.326506 | \n",
" 130.543464 | \n",
" 175219.510303 | \n",
" 137.439265 | \n",
" 135476.115278 | \n",
" 138.558752 | \n",
" ... | \n",
" 146.059468 | \n",
" 139980.724158 | \n",
" 134.815374 | \n",
" 180995.654429 | \n",
" 138.663140 | \n",
" 143571.081234 | \n",
" 129.231669 | \n",
" 144540.250870 | \n",
" 145.000829 | \n",
" 163138.533592 | \n",
"
\n",
" \n",
" 66 | \n",
" 2021Q3 | \n",
" 2021 | \n",
" Q3 | \n",
" 141.167257 | \n",
" 162289.724156 | \n",
" 134.116570 | \n",
" 180015.444071 | \n",
" 143.522521 | \n",
" 141472.480114 | \n",
" 142.250634 | \n",
" ... | \n",
" 149.743153 | \n",
" 143511.101233 | \n",
" 136.178634 | \n",
" 182825.891020 | \n",
" 140.051373 | \n",
" 145008.450168 | \n",
" 135.759680 | \n",
" 151841.560426 | \n",
" 152.177814 | \n",
" 171213.265699 | \n",
"
\n",
" \n",
" 67 | \n",
" 2021Q4 | \n",
" 2021 | \n",
" Q4 | \n",
" 141.900751 | \n",
" 163132.969278 | \n",
" 134.883745 | \n",
" 181045.170599 | \n",
" 140.957739 | \n",
" 138944.332704 | \n",
" 140.072076 | \n",
" ... | \n",
" 156.033991 | \n",
" 149540.124905 | \n",
" 137.683726 | \n",
" 184846.544332 | \n",
" 141.311289 | \n",
" 146312.960523 | \n",
" 136.847809 | \n",
" 153058.587031 | \n",
" 152.044892 | \n",
" 171063.717288 | \n",
"
\n",
" \n",
"
\n",
"
68 rows × 25 columns
\n",
"
"
],
"text/plain": [
"1 Period Year Quarter Antrim and Newtownabbey HPI \\\n",
"0 2005Q1 2005 Q1 99.903277 \n",
"1 2005Q2 2005 Q2 99.723509 \n",
"2 2005Q3 2005 Q3 107.940849 \n",
"3 2005Q4 2005 Q4 111.934696 \n",
"4 2006Q1 2006 Q1 113.494351 \n",
".. ... ... ... ... \n",
"63 2020Q4 2020 Q4 131.045293 \n",
"64 2021Q1 2021 Q1 133.481101 \n",
"65 2021Q2 2021 Q2 137.827568 \n",
"66 2021Q3 2021 Q3 141.167257 \n",
"67 2021Q4 2021 Q4 141.900751 \n",
"\n",
"1 Antrim and Newtownabbey Standardised Price Ards and North Down HPI \\\n",
"0 114851.528270 97.150602 \n",
"1 114644.862732 100.794472 \n",
"2 124091.740608 102.167971 \n",
"3 128683.175719 106.396379 \n",
"4 130476.197845 109.206160 \n",
".. ... ... \n",
"63 150653.238745 123.824862 \n",
"64 153453.510344 128.398202 \n",
"65 158450.326506 130.543464 \n",
"66 162289.724156 134.116570 \n",
"67 163132.969278 134.883745 \n",
"\n",
"1 Ards and North Down Standardised Price \\\n",
"0 130398.569667 \n",
"1 137133.037807 \n",
"2 137133.037807 \n",
"3 142808.538807 \n",
"4 146579.915492 \n",
".. ... \n",
"63 166201.593253 \n",
"64 172340.072904 \n",
"65 175219.510303 \n",
"66 180015.444071 \n",
"67 181045.170599 \n",
"\n",
"1 Armagh City, Banbridge and Craigavon HPI \\\n",
"0 102.245597 \n",
"1 106.325843 \n",
"2 110.006212 \n",
"3 116.073031 \n",
"4 121.831058 \n",
".. ... \n",
"63 130.857866 \n",
"64 130.382658 \n",
"65 137.439265 \n",
"66 143.522521 \n",
"67 140.957739 \n",
"\n",
"1 Armagh City, Banbridge and Craigavon Standardised Price Belfast HPI ... \\\n",
"0 100785.145986 99.839849 ... \n",
"1 104807.109982 100.589870 ... \n",
"2 108434.910333 109.614861 ... \n",
"3 114415.072260 110.728237 ... \n",
"4 120090.852733 112.326582 ... \n",
".. ... ... ... \n",
"63 128988.723586 133.525177 ... \n",
"64 128520.303209 135.257679 ... \n",
"65 135476.115278 138.558752 ... \n",
"66 141472.480114 142.250634 ... \n",
"67 138944.332704 140.072076 ... \n",
"\n",
"1 Fermanagh and Omagh HPI Fermanagh and Omagh Standardised Price \\\n",
"0 109.429237 104874.980231 \n",
"1 117.239850 112360.529330 \n",
"2 125.900145 120660.397585 \n",
"3 130.781315 125338.422216 \n",
"4 135.555749 129914.152078 \n",
".. ... ... \n",
"63 142.381814 136456.127817 \n",
"64 141.464114 135576.621629 \n",
"65 146.059468 139980.724158 \n",
"66 149.743153 143511.101233 \n",
"67 156.033991 149540.124905 \n",
"\n",
"1 Lisburn and Castlereagh HPI Lisburn and Castlereagh Standardised Price \\\n",
"0 95.958322 128828.327513 \n",
"1 100.164437 134475.225477 \n",
"2 106.757895 143327.237126 \n",
"3 111.307116 149434.769200 \n",
"4 110.539212 148403.823796 \n",
".. ... ... \n",
"63 129.797418 174258.823716 \n",
"64 131.166305 176096.615474 \n",
"65 134.815374 180995.654429 \n",
"66 136.178634 182825.891020 \n",
"67 137.683726 184846.544332 \n",
"\n",
"1 Mid and East Antrim HPI Mid and East Antrim Standardised Price \\\n",
"0 102.246427 105865.408901 \n",
"1 104.443325 108140.065924 \n",
"2 112.748278 116738.970434 \n",
"3 114.584090 118639.759900 \n",
"4 115.878975 119980.477260 \n",
".. ... ... \n",
"63 130.813697 135443.809729 \n",
"64 131.268719 135914.936888 \n",
"65 138.663140 143571.081234 \n",
"66 140.051373 145008.450168 \n",
"67 141.311289 146312.960523 \n",
"\n",
"1 Mid Ulster Standardised HPI Mid Ulster Standardised Price \\\n",
"0 102.714778 114882.211239 \n",
"1 110.386311 123462.501283 \n",
"2 117.595723 131525.929577 \n",
"3 121.851999 136286.397473 \n",
"4 129.544702 144890.365875 \n",
".. ... ... \n",
"63 124.617456 139379.446212 \n",
"64 127.657615 142779.737045 \n",
"65 129.231669 144540.250870 \n",
"66 135.759680 151841.560426 \n",
"67 136.847809 153058.587031 \n",
"\n",
"1 Newry, Mourne and Down HPI Newry, Mourne and Down Standardised Price \n",
"0 100.810773 113420.880186 \n",
"1 111.965743 125971.191415 \n",
"2 117.235685 131900.333698 \n",
"3 123.628047 139092.296651 \n",
"4 127.449613 143391.890242 \n",
".. ... ... \n",
"63 138.815696 156179.721555 \n",
"64 138.481347 155803.549899 \n",
"65 145.000829 163138.533592 \n",
"66 152.177814 171213.265699 \n",
"67 152.044892 171063.717288 \n",
"\n",
"[68 rows x 25 columns]"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = basic_cleanup(source_df['Table 5'])\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "611615db-cc22-4e38-9286-63146eb34a6c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Antrim and Newtownabbey', 'Ards and North Down',\n",
" 'Armagh City, Banbridge and Craigavon', 'Belfast',\n",
" 'Causeway Coast and Glens', 'Derry City and Strabane',\n",
" 'Fermanagh and Omagh', 'Lisburn and Castlereagh', 'Mid and East Antrim',\n",
" 'Mid Ulster Standardised', 'Mid Ulster', 'Newry, Mourne and Down'],\n",
" dtype='object', name=1)"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Two inner-columns per LGD\n",
"lgds = df.columns[3:].str.replace(' HPI','').str.replace(' Standardised Price','').unique()\n",
"lgds"
]
},
{
"cell_type": "markdown",
"id": "b734fecd-c196-4a87-b37b-986dd4ba85d7",
"metadata": {},
"source": [
"For some reason; Mid-ulster has a 'Standardised HPI' which throws off the above trick, so we gotta make it ugly..."
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "63a476cc-660e-4882-9781-614e52d14f99",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Antrim and Newtownabbey', 'Ards and North Down',\n",
" 'Armagh City, Banbridge and Craigavon', 'Belfast',\n",
" 'Causeway Coast and Glens', 'Derry City and Strabane',\n",
" 'Fermanagh and Omagh', 'Lisburn and Castlereagh', 'Mid and East Antrim',\n",
" 'Mid Ulster', 'Newry, Mourne and Down'],\n",
" dtype='object', name=1)"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lgds = df.columns[3:].str.replace(' Standardised HPI',' HPI')\\\n",
" .str.replace(' HPI','')\\\n",
" .str.replace(' Standardised Price','').unique()\n",
"lgds"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "42da4ee0-5452-466c-9693-acfa73fda030",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Period | \n",
" Year | \n",
" Quarter | \n",
" (Antrim and Newtownabbey, Index) | \n",
" (Antrim and Newtownabbey, Price) | \n",
" (Ards and North Down, Index) | \n",
" (Ards and North Down, Price) | \n",
" (Armagh City, Banbridge and Craigavon, Index) | \n",
" (Armagh City, Banbridge and Craigavon, Price) | \n",
" (Belfast, Index) | \n",
" ... | \n",
" (Fermanagh and Omagh, Index) | \n",
" (Fermanagh and Omagh, Price) | \n",
" (Lisburn and Castlereagh, Index) | \n",
" (Lisburn and Castlereagh, Price) | \n",
" (Mid and East Antrim, Index) | \n",
" (Mid and East Antrim, Price) | \n",
" (Mid Ulster, Index) | \n",
" (Mid Ulster, Price) | \n",
" (Newry, Mourne and Down, Index) | \n",
" (Newry, Mourne and Down, Price) | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2005Q1 | \n",
" 2005 | \n",
" Q1 | \n",
" 99.903277 | \n",
" 114851.528270 | \n",
" 97.150602 | \n",
" 130398.569667 | \n",
" 102.245597 | \n",
" 100785.145986 | \n",
" 99.839849 | \n",
" ... | \n",
" 109.429237 | \n",
" 104874.980231 | \n",
" 95.958322 | \n",
" 128828.327513 | \n",
" 102.246427 | \n",
" 105865.408901 | \n",
" 102.714778 | \n",
" 114882.211239 | \n",
" 100.810773 | \n",
" 113420.880186 | \n",
"
\n",
" \n",
" 1 | \n",
" 2005Q2 | \n",
" 2005 | \n",
" Q2 | \n",
" 99.723509 | \n",
" 114644.862732 | \n",
" 100.794472 | \n",
" 137133.037807 | \n",
" 106.325843 | \n",
" 104807.109982 | \n",
" 100.589870 | \n",
" ... | \n",
" 117.239850 | \n",
" 112360.529330 | \n",
" 100.164437 | \n",
" 134475.225477 | \n",
" 104.443325 | \n",
" 108140.065924 | \n",
" 110.386311 | \n",
" 123462.501283 | \n",
" 111.965743 | \n",
" 125971.191415 | \n",
"
\n",
" \n",
" 2 | \n",
" 2005Q3 | \n",
" 2005 | \n",
" Q3 | \n",
" 107.940849 | \n",
" 124091.740608 | \n",
" 102.167971 | \n",
" 137133.037807 | \n",
" 110.006212 | \n",
" 108434.910333 | \n",
" 109.614861 | \n",
" ... | \n",
" 125.900145 | \n",
" 120660.397585 | \n",
" 106.757895 | \n",
" 143327.237126 | \n",
" 112.748278 | \n",
" 116738.970434 | \n",
" 117.595723 | \n",
" 131525.929577 | \n",
" 117.235685 | \n",
" 131900.333698 | \n",
"
\n",
" \n",
" 3 | \n",
" 2005Q4 | \n",
" 2005 | \n",
" Q4 | \n",
" 111.934696 | \n",
" 128683.175719 | \n",
" 106.396379 | \n",
" 142808.538807 | \n",
" 116.073031 | \n",
" 114415.072260 | \n",
" 110.728237 | \n",
" ... | \n",
" 130.781315 | \n",
" 125338.422216 | \n",
" 111.307116 | \n",
" 149434.769200 | \n",
" 114.584090 | \n",
" 118639.759900 | \n",
" 121.851999 | \n",
" 136286.397473 | \n",
" 123.628047 | \n",
" 139092.296651 | \n",
"
\n",
" \n",
" 4 | \n",
" 2006Q1 | \n",
" 2006 | \n",
" Q1 | \n",
" 113.494351 | \n",
" 130476.197845 | \n",
" 109.206160 | \n",
" 146579.915492 | \n",
" 121.831058 | \n",
" 120090.852733 | \n",
" 112.326582 | \n",
" ... | \n",
" 135.555749 | \n",
" 129914.152078 | \n",
" 110.539212 | \n",
" 148403.823796 | \n",
" 115.878975 | \n",
" 119980.477260 | \n",
" 129.544702 | \n",
" 144890.365875 | \n",
" 127.449613 | \n",
" 143391.890242 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 63 | \n",
" 2020Q4 | \n",
" 2020 | \n",
" Q4 | \n",
" 131.045293 | \n",
" 150653.238745 | \n",
" 123.824862 | \n",
" 166201.593253 | \n",
" 130.857866 | \n",
" 128988.723586 | \n",
" 133.525177 | \n",
" ... | \n",
" 142.381814 | \n",
" 136456.127817 | \n",
" 129.797418 | \n",
" 174258.823716 | \n",
" 130.813697 | \n",
" 135443.809729 | \n",
" 124.617456 | \n",
" 139379.446212 | \n",
" 138.815696 | \n",
" 156179.721555 | \n",
"
\n",
" \n",
" 64 | \n",
" 2021Q1 | \n",
" 2021 | \n",
" Q1 | \n",
" 133.481101 | \n",
" 153453.510344 | \n",
" 128.398202 | \n",
" 172340.072904 | \n",
" 130.382658 | \n",
" 128520.303209 | \n",
" 135.257679 | \n",
" ... | \n",
" 141.464114 | \n",
" 135576.621629 | \n",
" 131.166305 | \n",
" 176096.615474 | \n",
" 131.268719 | \n",
" 135914.936888 | \n",
" 127.657615 | \n",
" 142779.737045 | \n",
" 138.481347 | \n",
" 155803.549899 | \n",
"
\n",
" \n",
" 65 | \n",
" 2021Q2 | \n",
" 2021 | \n",
" Q2 | \n",
" 137.827568 | \n",
" 158450.326506 | \n",
" 130.543464 | \n",
" 175219.510303 | \n",
" 137.439265 | \n",
" 135476.115278 | \n",
" 138.558752 | \n",
" ... | \n",
" 146.059468 | \n",
" 139980.724158 | \n",
" 134.815374 | \n",
" 180995.654429 | \n",
" 138.663140 | \n",
" 143571.081234 | \n",
" 129.231669 | \n",
" 144540.250870 | \n",
" 145.000829 | \n",
" 163138.533592 | \n",
"
\n",
" \n",
" 66 | \n",
" 2021Q3 | \n",
" 2021 | \n",
" Q3 | \n",
" 141.167257 | \n",
" 162289.724156 | \n",
" 134.116570 | \n",
" 180015.444071 | \n",
" 143.522521 | \n",
" 141472.480114 | \n",
" 142.250634 | \n",
" ... | \n",
" 149.743153 | \n",
" 143511.101233 | \n",
" 136.178634 | \n",
" 182825.891020 | \n",
" 140.051373 | \n",
" 145008.450168 | \n",
" 135.759680 | \n",
" 151841.560426 | \n",
" 152.177814 | \n",
" 171213.265699 | \n",
"
\n",
" \n",
" 67 | \n",
" 2021Q4 | \n",
" 2021 | \n",
" Q4 | \n",
" 141.900751 | \n",
" 163132.969278 | \n",
" 134.883745 | \n",
" 181045.170599 | \n",
" 140.957739 | \n",
" 138944.332704 | \n",
" 140.072076 | \n",
" ... | \n",
" 156.033991 | \n",
" 149540.124905 | \n",
" 137.683726 | \n",
" 184846.544332 | \n",
" 141.311289 | \n",
" 146312.960523 | \n",
" 136.847809 | \n",
" 153058.587031 | \n",
" 152.044892 | \n",
" 171063.717288 | \n",
"
\n",
" \n",
"
\n",
"
68 rows × 25 columns
\n",
"
"
],
"text/plain": [
" Period Year Quarter (Antrim and Newtownabbey, Index) \\\n",
"0 2005Q1 2005 Q1 99.903277 \n",
"1 2005Q2 2005 Q2 99.723509 \n",
"2 2005Q3 2005 Q3 107.940849 \n",
"3 2005Q4 2005 Q4 111.934696 \n",
"4 2006Q1 2006 Q1 113.494351 \n",
".. ... ... ... ... \n",
"63 2020Q4 2020 Q4 131.045293 \n",
"64 2021Q1 2021 Q1 133.481101 \n",
"65 2021Q2 2021 Q2 137.827568 \n",
"66 2021Q3 2021 Q3 141.167257 \n",
"67 2021Q4 2021 Q4 141.900751 \n",
"\n",
" (Antrim and Newtownabbey, Price) (Ards and North Down, Index) \\\n",
"0 114851.528270 97.150602 \n",
"1 114644.862732 100.794472 \n",
"2 124091.740608 102.167971 \n",
"3 128683.175719 106.396379 \n",
"4 130476.197845 109.206160 \n",
".. ... ... \n",
"63 150653.238745 123.824862 \n",
"64 153453.510344 128.398202 \n",
"65 158450.326506 130.543464 \n",
"66 162289.724156 134.116570 \n",
"67 163132.969278 134.883745 \n",
"\n",
" (Ards and North Down, Price) \\\n",
"0 130398.569667 \n",
"1 137133.037807 \n",
"2 137133.037807 \n",
"3 142808.538807 \n",
"4 146579.915492 \n",
".. ... \n",
"63 166201.593253 \n",
"64 172340.072904 \n",
"65 175219.510303 \n",
"66 180015.444071 \n",
"67 181045.170599 \n",
"\n",
" (Armagh City, Banbridge and Craigavon, Index) \\\n",
"0 102.245597 \n",
"1 106.325843 \n",
"2 110.006212 \n",
"3 116.073031 \n",
"4 121.831058 \n",
".. ... \n",
"63 130.857866 \n",
"64 130.382658 \n",
"65 137.439265 \n",
"66 143.522521 \n",
"67 140.957739 \n",
"\n",
" (Armagh City, Banbridge and Craigavon, Price) (Belfast, Index) ... \\\n",
"0 100785.145986 99.839849 ... \n",
"1 104807.109982 100.589870 ... \n",
"2 108434.910333 109.614861 ... \n",
"3 114415.072260 110.728237 ... \n",
"4 120090.852733 112.326582 ... \n",
".. ... ... ... \n",
"63 128988.723586 133.525177 ... \n",
"64 128520.303209 135.257679 ... \n",
"65 135476.115278 138.558752 ... \n",
"66 141472.480114 142.250634 ... \n",
"67 138944.332704 140.072076 ... \n",
"\n",
" (Fermanagh and Omagh, Index) (Fermanagh and Omagh, Price) \\\n",
"0 109.429237 104874.980231 \n",
"1 117.239850 112360.529330 \n",
"2 125.900145 120660.397585 \n",
"3 130.781315 125338.422216 \n",
"4 135.555749 129914.152078 \n",
".. ... ... \n",
"63 142.381814 136456.127817 \n",
"64 141.464114 135576.621629 \n",
"65 146.059468 139980.724158 \n",
"66 149.743153 143511.101233 \n",
"67 156.033991 149540.124905 \n",
"\n",
" (Lisburn and Castlereagh, Index) (Lisburn and Castlereagh, Price) \\\n",
"0 95.958322 128828.327513 \n",
"1 100.164437 134475.225477 \n",
"2 106.757895 143327.237126 \n",
"3 111.307116 149434.769200 \n",
"4 110.539212 148403.823796 \n",
".. ... ... \n",
"63 129.797418 174258.823716 \n",
"64 131.166305 176096.615474 \n",
"65 134.815374 180995.654429 \n",
"66 136.178634 182825.891020 \n",
"67 137.683726 184846.544332 \n",
"\n",
" (Mid and East Antrim, Index) (Mid and East Antrim, Price) \\\n",
"0 102.246427 105865.408901 \n",
"1 104.443325 108140.065924 \n",
"2 112.748278 116738.970434 \n",
"3 114.584090 118639.759900 \n",
"4 115.878975 119980.477260 \n",
".. ... ... \n",
"63 130.813697 135443.809729 \n",
"64 131.268719 135914.936888 \n",
"65 138.663140 143571.081234 \n",
"66 140.051373 145008.450168 \n",
"67 141.311289 146312.960523 \n",
"\n",
" (Mid Ulster, Index) (Mid Ulster, Price) (Newry, Mourne and Down, Index) \\\n",
"0 102.714778 114882.211239 100.810773 \n",
"1 110.386311 123462.501283 111.965743 \n",
"2 117.595723 131525.929577 117.235685 \n",
"3 121.851999 136286.397473 123.628047 \n",
"4 129.544702 144890.365875 127.449613 \n",
".. ... ... ... \n",
"63 124.617456 139379.446212 138.815696 \n",
"64 127.657615 142779.737045 138.481347 \n",
"65 129.231669 144540.250870 145.000829 \n",
"66 135.759680 151841.560426 152.177814 \n",
"67 136.847809 153058.587031 152.044892 \n",
"\n",
" (Newry, Mourne and Down, Price) \n",
"0 113420.880186 \n",
"1 125971.191415 \n",
"2 131900.333698 \n",
"3 139092.296651 \n",
"4 143391.890242 \n",
".. ... \n",
"63 156179.721555 \n",
"64 155803.549899 \n",
"65 163138.533592 \n",
"66 171213.265699 \n",
"67 171063.717288 \n",
"\n",
"[68 rows x 25 columns]"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns = [*df.columns[:3], *pd.MultiIndex.from_product([lgds,['Index','Price']], names=['LGD','Metric'])]\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "888c048a-ad14-4c9a-8f3e-1abe946483fe",
"metadata": {},
"source": [
"We _could_ turn this into a proper multiindex but it would mean pushing the Period/Year/Quarter columns into keys which would be inconsistent behaviour with the rest of the 'cleaned' dataset, so that can be a downstream problem; at least we've got the relevant metrics consistent!"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "d3b1d332-d808-466b-86c3-0eabb716f2f9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Period | \n",
" Year | \n",
" Quarter | \n",
" (Antrim and Newtownabbey, Index) | \n",
" (Antrim and Newtownabbey, Price) | \n",
" (Ards and North Down, Index) | \n",
" (Ards and North Down, Price) | \n",
" (Armagh City, Banbridge and Craigavon, Index) | \n",
" (Armagh City, Banbridge and Craigavon, Price) | \n",
" (Belfast, Index) | \n",
" ... | \n",
" (Fermanagh and Omagh, Index) | \n",
" (Fermanagh and Omagh, Price) | \n",
" (Lisburn and Castlereagh, Index) | \n",
" (Lisburn and Castlereagh, Price) | \n",
" (Mid and East Antrim, Index) | \n",
" (Mid and East Antrim, Price) | \n",
" (Mid Ulster, Index) | \n",
" (Mid Ulster, Price) | \n",
" (Newry, Mourne and Down, Index) | \n",
" (Newry, Mourne and Down, Price) | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2005Q1 | \n",
" 2005 | \n",
" Q1 | \n",
" 99.903277 | \n",
" 114851.528270 | \n",
" 97.150602 | \n",
" 130398.569667 | \n",
" 102.245597 | \n",
" 100785.145986 | \n",
" 99.839849 | \n",
" ... | \n",
" 109.429237 | \n",
" 104874.980231 | \n",
" 95.958322 | \n",
" 128828.327513 | \n",
" 102.246427 | \n",
" 105865.408901 | \n",
" 102.714778 | \n",
" 114882.211239 | \n",
" 100.810773 | \n",
" 113420.880186 | \n",
"
\n",
" \n",
" 1 | \n",
" 2005Q2 | \n",
" 2005 | \n",
" Q2 | \n",
" 99.723509 | \n",
" 114644.862732 | \n",
" 100.794472 | \n",
" 137133.037807 | \n",
" 106.325843 | \n",
" 104807.109982 | \n",
" 100.589870 | \n",
" ... | \n",
" 117.239850 | \n",
" 112360.529330 | \n",
" 100.164437 | \n",
" 134475.225477 | \n",
" 104.443325 | \n",
" 108140.065924 | \n",
" 110.386311 | \n",
" 123462.501283 | \n",
" 111.965743 | \n",
" 125971.191415 | \n",
"
\n",
" \n",
" 2 | \n",
" 2005Q3 | \n",
" 2005 | \n",
" Q3 | \n",
" 107.940849 | \n",
" 124091.740608 | \n",
" 102.167971 | \n",
" 137133.037807 | \n",
" 110.006212 | \n",
" 108434.910333 | \n",
" 109.614861 | \n",
" ... | \n",
" 125.900145 | \n",
" 120660.397585 | \n",
" 106.757895 | \n",
" 143327.237126 | \n",
" 112.748278 | \n",
" 116738.970434 | \n",
" 117.595723 | \n",
" 131525.929577 | \n",
" 117.235685 | \n",
" 131900.333698 | \n",
"
\n",
" \n",
" 3 | \n",
" 2005Q4 | \n",
" 2005 | \n",
" Q4 | \n",
" 111.934696 | \n",
" 128683.175719 | \n",
" 106.396379 | \n",
" 142808.538807 | \n",
" 116.073031 | \n",
" 114415.072260 | \n",
" 110.728237 | \n",
" ... | \n",
" 130.781315 | \n",
" 125338.422216 | \n",
" 111.307116 | \n",
" 149434.769200 | \n",
" 114.584090 | \n",
" 118639.759900 | \n",
" 121.851999 | \n",
" 136286.397473 | \n",
" 123.628047 | \n",
" 139092.296651 | \n",
"
\n",
" \n",
" 4 | \n",
" 2006Q1 | \n",
" 2006 | \n",
" Q1 | \n",
" 113.494351 | \n",
" 130476.197845 | \n",
" 109.206160 | \n",
" 146579.915492 | \n",
" 121.831058 | \n",
" 120090.852733 | \n",
" 112.326582 | \n",
" ... | \n",
" 135.555749 | \n",
" 129914.152078 | \n",
" 110.539212 | \n",
" 148403.823796 | \n",
" 115.878975 | \n",
" 119980.477260 | \n",
" 129.544702 | \n",
" 144890.365875 | \n",
" 127.449613 | \n",
" 143391.890242 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 63 | \n",
" 2020Q4 | \n",
" 2020 | \n",
" Q4 | \n",
" 131.045293 | \n",
" 150653.238745 | \n",
" 123.824862 | \n",
" 166201.593253 | \n",
" 130.857866 | \n",
" 128988.723586 | \n",
" 133.525177 | \n",
" ... | \n",
" 142.381814 | \n",
" 136456.127817 | \n",
" 129.797418 | \n",
" 174258.823716 | \n",
" 130.813697 | \n",
" 135443.809729 | \n",
" 124.617456 | \n",
" 139379.446212 | \n",
" 138.815696 | \n",
" 156179.721555 | \n",
"
\n",
" \n",
" 64 | \n",
" 2021Q1 | \n",
" 2021 | \n",
" Q1 | \n",
" 133.481101 | \n",
" 153453.510344 | \n",
" 128.398202 | \n",
" 172340.072904 | \n",
" 130.382658 | \n",
" 128520.303209 | \n",
" 135.257679 | \n",
" ... | \n",
" 141.464114 | \n",
" 135576.621629 | \n",
" 131.166305 | \n",
" 176096.615474 | \n",
" 131.268719 | \n",
" 135914.936888 | \n",
" 127.657615 | \n",
" 142779.737045 | \n",
" 138.481347 | \n",
" 155803.549899 | \n",
"
\n",
" \n",
" 65 | \n",
" 2021Q2 | \n",
" 2021 | \n",
" Q2 | \n",
" 137.827568 | \n",
" 158450.326506 | \n",
" 130.543464 | \n",
" 175219.510303 | \n",
" 137.439265 | \n",
" 135476.115278 | \n",
" 138.558752 | \n",
" ... | \n",
" 146.059468 | \n",
" 139980.724158 | \n",
" 134.815374 | \n",
" 180995.654429 | \n",
" 138.663140 | \n",
" 143571.081234 | \n",
" 129.231669 | \n",
" 144540.250870 | \n",
" 145.000829 | \n",
" 163138.533592 | \n",
"
\n",
" \n",
" 66 | \n",
" 2021Q3 | \n",
" 2021 | \n",
" Q3 | \n",
" 141.167257 | \n",
" 162289.724156 | \n",
" 134.116570 | \n",
" 180015.444071 | \n",
" 143.522521 | \n",
" 141472.480114 | \n",
" 142.250634 | \n",
" ... | \n",
" 149.743153 | \n",
" 143511.101233 | \n",
" 136.178634 | \n",
" 182825.891020 | \n",
" 140.051373 | \n",
" 145008.450168 | \n",
" 135.759680 | \n",
" 151841.560426 | \n",
" 152.177814 | \n",
" 171213.265699 | \n",
"
\n",
" \n",
" 67 | \n",
" 2021Q4 | \n",
" 2021 | \n",
" Q4 | \n",
" 141.900751 | \n",
" 163132.969278 | \n",
" 134.883745 | \n",
" 181045.170599 | \n",
" 140.957739 | \n",
" 138944.332704 | \n",
" 140.072076 | \n",
" ... | \n",
" 156.033991 | \n",
" 149540.124905 | \n",
" 137.683726 | \n",
" 184846.544332 | \n",
" 141.311289 | \n",
" 146312.960523 | \n",
" 136.847809 | \n",
" 153058.587031 | \n",
" 152.044892 | \n",
" 171063.717288 | \n",
"
\n",
" \n",
"
\n",
"
68 rows × 25 columns
\n",
"
"
],
"text/plain": [
" Period Year Quarter (Antrim and Newtownabbey, Index) \\\n",
"0 2005Q1 2005 Q1 99.903277 \n",
"1 2005Q2 2005 Q2 99.723509 \n",
"2 2005Q3 2005 Q3 107.940849 \n",
"3 2005Q4 2005 Q4 111.934696 \n",
"4 2006Q1 2006 Q1 113.494351 \n",
".. ... ... ... ... \n",
"63 2020Q4 2020 Q4 131.045293 \n",
"64 2021Q1 2021 Q1 133.481101 \n",
"65 2021Q2 2021 Q2 137.827568 \n",
"66 2021Q3 2021 Q3 141.167257 \n",
"67 2021Q4 2021 Q4 141.900751 \n",
"\n",
" (Antrim and Newtownabbey, Price) (Ards and North Down, Index) \\\n",
"0 114851.528270 97.150602 \n",
"1 114644.862732 100.794472 \n",
"2 124091.740608 102.167971 \n",
"3 128683.175719 106.396379 \n",
"4 130476.197845 109.206160 \n",
".. ... ... \n",
"63 150653.238745 123.824862 \n",
"64 153453.510344 128.398202 \n",
"65 158450.326506 130.543464 \n",
"66 162289.724156 134.116570 \n",
"67 163132.969278 134.883745 \n",
"\n",
" (Ards and North Down, Price) \\\n",
"0 130398.569667 \n",
"1 137133.037807 \n",
"2 137133.037807 \n",
"3 142808.538807 \n",
"4 146579.915492 \n",
".. ... \n",
"63 166201.593253 \n",
"64 172340.072904 \n",
"65 175219.510303 \n",
"66 180015.444071 \n",
"67 181045.170599 \n",
"\n",
" (Armagh City, Banbridge and Craigavon, Index) \\\n",
"0 102.245597 \n",
"1 106.325843 \n",
"2 110.006212 \n",
"3 116.073031 \n",
"4 121.831058 \n",
".. ... \n",
"63 130.857866 \n",
"64 130.382658 \n",
"65 137.439265 \n",
"66 143.522521 \n",
"67 140.957739 \n",
"\n",
" (Armagh City, Banbridge and Craigavon, Price) (Belfast, Index) ... \\\n",
"0 100785.145986 99.839849 ... \n",
"1 104807.109982 100.589870 ... \n",
"2 108434.910333 109.614861 ... \n",
"3 114415.072260 110.728237 ... \n",
"4 120090.852733 112.326582 ... \n",
".. ... ... ... \n",
"63 128988.723586 133.525177 ... \n",
"64 128520.303209 135.257679 ... \n",
"65 135476.115278 138.558752 ... \n",
"66 141472.480114 142.250634 ... \n",
"67 138944.332704 140.072076 ... \n",
"\n",
" (Fermanagh and Omagh, Index) (Fermanagh and Omagh, Price) \\\n",
"0 109.429237 104874.980231 \n",
"1 117.239850 112360.529330 \n",
"2 125.900145 120660.397585 \n",
"3 130.781315 125338.422216 \n",
"4 135.555749 129914.152078 \n",
".. ... ... \n",
"63 142.381814 136456.127817 \n",
"64 141.464114 135576.621629 \n",
"65 146.059468 139980.724158 \n",
"66 149.743153 143511.101233 \n",
"67 156.033991 149540.124905 \n",
"\n",
" (Lisburn and Castlereagh, Index) (Lisburn and Castlereagh, Price) \\\n",
"0 95.958322 128828.327513 \n",
"1 100.164437 134475.225477 \n",
"2 106.757895 143327.237126 \n",
"3 111.307116 149434.769200 \n",
"4 110.539212 148403.823796 \n",
".. ... ... \n",
"63 129.797418 174258.823716 \n",
"64 131.166305 176096.615474 \n",
"65 134.815374 180995.654429 \n",
"66 136.178634 182825.891020 \n",
"67 137.683726 184846.544332 \n",
"\n",
" (Mid and East Antrim, Index) (Mid and East Antrim, Price) \\\n",
"0 102.246427 105865.408901 \n",
"1 104.443325 108140.065924 \n",
"2 112.748278 116738.970434 \n",
"3 114.584090 118639.759900 \n",
"4 115.878975 119980.477260 \n",
".. ... ... \n",
"63 130.813697 135443.809729 \n",
"64 131.268719 135914.936888 \n",
"65 138.663140 143571.081234 \n",
"66 140.051373 145008.450168 \n",
"67 141.311289 146312.960523 \n",
"\n",
" (Mid Ulster, Index) (Mid Ulster, Price) (Newry, Mourne and Down, Index) \\\n",
"0 102.714778 114882.211239 100.810773 \n",
"1 110.386311 123462.501283 111.965743 \n",
"2 117.595723 131525.929577 117.235685 \n",
"3 121.851999 136286.397473 123.628047 \n",
"4 129.544702 144890.365875 127.449613 \n",
".. ... ... ... \n",
"63 124.617456 139379.446212 138.815696 \n",
"64 127.657615 142779.737045 138.481347 \n",
"65 129.231669 144540.250870 145.000829 \n",
"66 135.759680 151841.560426 152.177814 \n",
"67 136.847809 153058.587031 152.044892 \n",
"\n",
" (Newry, Mourne and Down, Price) \n",
"0 113420.880186 \n",
"1 125971.191415 \n",
"2 131900.333698 \n",
"3 139092.296651 \n",
"4 143391.890242 \n",
".. ... \n",
"63 156179.721555 \n",
"64 155803.549899 \n",
"65 163138.533592 \n",
"66 171213.265699 \n",
"67 171063.717288 \n",
"\n",
"[68 rows x 25 columns]"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def cleanup_table_5(df):\n",
" \"\"\"\n",
" Table 5: Standardised House Price & Index for each Local Government District Northern Ireland\n",
" * \n",
" \"\"\"\n",
" # Basic Cleanup first\n",
" df = basic_cleanup(df)\n",
" # Build multi-index of LGD / Metric [Index,Price]\n",
" # Two inner-columns per LGD\n",
" lgds = df.columns[3:].str.replace(' Standardised HPI',' HPI')\\\n",
" .str.replace(' HPI','')\\\n",
" .str.replace(' Standardised Price','')\\\n",
" .unique()\n",
" df.columns = [*df.columns[:3], *pd.MultiIndex.from_product([lgds,['Index','Price']], names=['LGD','Metric'])]\n",
" return df\n",
"\n",
"cleanup_table_5(source_df['Table 5'])"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "cee2ae3e-2ae7-4ecf-a57f-e7ad8ef10951",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(12, 20)"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dest_df['Table 5']=cleanup_table_5(source_df['Table 5'])\n",
"len(dest_df), len([k for k in source_df.keys() if k.startswith('Table') and k not in dest_df])"
]
},
{
"cell_type": "markdown",
"id": "21b52cab-3ecf-4c09-ae22-add37683444a",
"metadata": {},
"source": [
"### Table 5a: Number of Verified Residential Property Sales by Local Government District\n",
"\n",
"This one has a new problem; the Sale Year/Quarter is now squished together. This will do a few terrible things to our `basic_cleanup` so this needs to be done ahead of cleanup. \n",
"Also has annual total lines."
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "07be1ece-5b36-45c4-a8a3-6dc9fce71826",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Table 5a: Number of Verified Residential Property Sales by Local Government District | \n",
" Unnamed: 1 | \n",
" Unnamed: 2 | \n",
" Unnamed: 3 | \n",
" Unnamed: 4 | \n",
" Unnamed: 5 | \n",
" Unnamed: 6 | \n",
" Unnamed: 7 | \n",
" Unnamed: 8 | \n",
" Unnamed: 9 | \n",
" Unnamed: 10 | \n",
" Unnamed: 11 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Please note figures for the 2 most recent quar... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" Back to contents | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" Sale Year/Quarter | \n",
" Antrim and Newtownabbey | \n",
" Ards and North Down | \n",
" Armagh City, Banbridge and Craigavon | \n",
" Belfast | \n",
" Causeway Coast and Glens | \n",
" Derry City and Strabane | \n",
" Fermanagh and Omagh | \n",
" Lisburn and Castlereagh | \n",
" Mid and East Antrim | \n",
" Mid Ulster | \n",
" Newry, Mourne and Down | \n",
"
\n",
" \n",
" 3 | \n",
" Q1 2005 | \n",
" 236 | \n",
" 320 | \n",
" 333 | \n",
" 623 | \n",
" 236 | \n",
" 226 | \n",
" 138 | \n",
" 219 | \n",
" 188 | \n",
" 176 | \n",
" 241 | \n",
"
\n",
" \n",
" 4 | \n",
" Q2 2005 | \n",
" 735 | \n",
" 857 | \n",
" 961 | \n",
" 1549 | \n",
" 712 | \n",
" 637 | \n",
" 316 | \n",
" 655 | \n",
" 618 | \n",
" 428 | \n",
" 505 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 85 | \n",
" Q3 2021 | \n",
" 739 | \n",
" 989 | \n",
" 931 | \n",
" 1584 | \n",
" 625 | \n",
" 485 | \n",
" 325 | \n",
" 869 | \n",
" 671 | \n",
" 377 | \n",
" 574 | \n",
"
\n",
" \n",
" 86 | \n",
" Q4 2021 | \n",
" 532 | \n",
" 702 | \n",
" 730 | \n",
" 1272 | \n",
" 417 | \n",
" 405 | \n",
" 250 | \n",
" 572 | \n",
" 474 | \n",
" 359 | \n",
" 437 | \n",
"
\n",
" \n",
" 87 | \n",
" 2021 Total | \n",
" 2647 | \n",
" 3685 | \n",
" 3333 | \n",
" 5934 | \n",
" 2327 | \n",
" 1803 | \n",
" 1181 | \n",
" 3053 | \n",
" 2346 | \n",
" 1460 | \n",
" 2000 | \n",
"
\n",
" \n",
" 88 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 89 | \n",
" Please note figures for the 2 most recent quar... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
90 rows × 12 columns
\n",
"
"
],
"text/plain": [
" Table 5a: Number of Verified Residential Property Sales by Local Government District \\\n",
"0 Please note figures for the 2 most recent quar... \n",
"1 Back to contents \n",
"2 Sale Year/Quarter \n",
"3 Q1 2005 \n",
"4 Q2 2005 \n",
".. ... \n",
"85 Q3 2021 \n",
"86 Q4 2021 \n",
"87 2021 Total \n",
"88 NaN \n",
"89 Please note figures for the 2 most recent quar... \n",
"\n",
" Unnamed: 1 Unnamed: 2 \\\n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 Antrim and Newtownabbey Ards and North Down \n",
"3 236 320 \n",
"4 735 857 \n",
".. ... ... \n",
"85 739 989 \n",
"86 532 702 \n",
"87 2647 3685 \n",
"88 NaN NaN \n",
"89 NaN NaN \n",
"\n",
" Unnamed: 3 Unnamed: 4 Unnamed: 5 \\\n",
"0 NaN NaN NaN \n",
"1 NaN NaN NaN \n",
"2 Armagh City, Banbridge and Craigavon Belfast Causeway Coast and Glens \n",
"3 333 623 236 \n",
"4 961 1549 712 \n",
".. ... ... ... \n",
"85 931 1584 625 \n",
"86 730 1272 417 \n",
"87 3333 5934 2327 \n",
"88 NaN NaN NaN \n",
"89 NaN NaN NaN \n",
"\n",
" Unnamed: 6 Unnamed: 7 Unnamed: 8 \\\n",
"0 NaN NaN NaN \n",
"1 NaN NaN NaN \n",
"2 Derry City and Strabane Fermanagh and Omagh Lisburn and Castlereagh \n",
"3 226 138 219 \n",
"4 637 316 655 \n",
".. ... ... ... \n",
"85 485 325 869 \n",
"86 405 250 572 \n",
"87 1803 1181 3053 \n",
"88 NaN NaN NaN \n",
"89 NaN NaN NaN \n",
"\n",
" Unnamed: 9 Unnamed: 10 Unnamed: 11 \n",
"0 NaN NaN NaN \n",
"1 NaN NaN NaN \n",
"2 Mid and East Antrim Mid Ulster Newry, Mourne and Down \n",
"3 188 176 241 \n",
"4 618 428 505 \n",
".. ... ... ... \n",
"85 671 377 574 \n",
"86 474 359 437 \n",
"87 2346 1460 2000 \n",
"88 NaN NaN NaN \n",
"89 NaN NaN NaN \n",
"\n",
"[90 rows x 12 columns]"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = source_df['Table 5a'].copy()\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "d2975a62-ec06-48b1-ad65-bd6051f6c881",
"metadata": {},
"outputs": [],
"source": [
"dates = df.iloc[:,0].str.extract('(Q[1-4]) ([0-9]{4})').rename(columns={0:'Quarter',1:'Year'})\n",
"for c in ['Quarter','Year']:# insert the dates in order, so they come out in reverse in the insert\n",
" df.insert(1,c,dates[c])\n",
" df.iloc[2,1]=c # Need to have the right colname for when `basic_cleanup` is called.\n"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "0dbd0401-3746-4e61-bf10-54d260cfc646",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Table 5a: Number of Verified Residential Property Sales by Local Government District | \n",
" Year | \n",
" Quarter | \n",
" Unnamed: 1 | \n",
" Unnamed: 2 | \n",
" Unnamed: 3 | \n",
" Unnamed: 4 | \n",
" Unnamed: 5 | \n",
" Unnamed: 6 | \n",
" Unnamed: 7 | \n",
" Unnamed: 8 | \n",
" Unnamed: 9 | \n",
" Unnamed: 10 | \n",
" Unnamed: 11 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Please note figures for the 2 most recent quar... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" Back to contents | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" Sale Year/Quarter | \n",
" Year | \n",
" Quarter | \n",
" Antrim and Newtownabbey | \n",
" Ards and North Down | \n",
" Armagh City, Banbridge and Craigavon | \n",
" Belfast | \n",
" Causeway Coast and Glens | \n",
" Derry City and Strabane | \n",
" Fermanagh and Omagh | \n",
" Lisburn and Castlereagh | \n",
" Mid and East Antrim | \n",
" Mid Ulster | \n",
" Newry, Mourne and Down | \n",
"
\n",
" \n",
" 3 | \n",
" Q1 2005 | \n",
" 2005 | \n",
" Q1 | \n",
" 236 | \n",
" 320 | \n",
" 333 | \n",
" 623 | \n",
" 236 | \n",
" 226 | \n",
" 138 | \n",
" 219 | \n",
" 188 | \n",
" 176 | \n",
" 241 | \n",
"
\n",
" \n",
" 4 | \n",
" Q2 2005 | \n",
" 2005 | \n",
" Q2 | \n",
" 735 | \n",
" 857 | \n",
" 961 | \n",
" 1549 | \n",
" 712 | \n",
" 637 | \n",
" 316 | \n",
" 655 | \n",
" 618 | \n",
" 428 | \n",
" 505 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 85 | \n",
" Q3 2021 | \n",
" 2021 | \n",
" Q3 | \n",
" 739 | \n",
" 989 | \n",
" 931 | \n",
" 1584 | \n",
" 625 | \n",
" 485 | \n",
" 325 | \n",
" 869 | \n",
" 671 | \n",
" 377 | \n",
" 574 | \n",
"
\n",
" \n",
" 86 | \n",
" Q4 2021 | \n",
" 2021 | \n",
" Q4 | \n",
" 532 | \n",
" 702 | \n",
" 730 | \n",
" 1272 | \n",
" 417 | \n",
" 405 | \n",
" 250 | \n",
" 572 | \n",
" 474 | \n",
" 359 | \n",
" 437 | \n",
"
\n",
" \n",
" 87 | \n",
" 2021 Total | \n",
" NaN | \n",
" NaN | \n",
" 2647 | \n",
" 3685 | \n",
" 3333 | \n",
" 5934 | \n",
" 2327 | \n",
" 1803 | \n",
" 1181 | \n",
" 3053 | \n",
" 2346 | \n",
" 1460 | \n",
" 2000 | \n",
"
\n",
" \n",
" 88 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 89 | \n",
" Please note figures for the 2 most recent quar... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
90 rows × 14 columns
\n",
"
"
],
"text/plain": [
" Table 5a: Number of Verified Residential Property Sales by Local Government District \\\n",
"0 Please note figures for the 2 most recent quar... \n",
"1 Back to contents \n",
"2 Sale Year/Quarter \n",
"3 Q1 2005 \n",
"4 Q2 2005 \n",
".. ... \n",
"85 Q3 2021 \n",
"86 Q4 2021 \n",
"87 2021 Total \n",
"88 NaN \n",
"89 Please note figures for the 2 most recent quar... \n",
"\n",
" Year Quarter Unnamed: 1 Unnamed: 2 \\\n",
"0 NaN NaN NaN NaN \n",
"1 NaN NaN NaN NaN \n",
"2 Year Quarter Antrim and Newtownabbey Ards and North Down \n",
"3 2005 Q1 236 320 \n",
"4 2005 Q2 735 857 \n",
".. ... ... ... ... \n",
"85 2021 Q3 739 989 \n",
"86 2021 Q4 532 702 \n",
"87 NaN NaN 2647 3685 \n",
"88 NaN NaN NaN NaN \n",
"89 NaN NaN NaN NaN \n",
"\n",
" Unnamed: 3 Unnamed: 4 Unnamed: 5 \\\n",
"0 NaN NaN NaN \n",
"1 NaN NaN NaN \n",
"2 Armagh City, Banbridge and Craigavon Belfast Causeway Coast and Glens \n",
"3 333 623 236 \n",
"4 961 1549 712 \n",
".. ... ... ... \n",
"85 931 1584 625 \n",
"86 730 1272 417 \n",
"87 3333 5934 2327 \n",
"88 NaN NaN NaN \n",
"89 NaN NaN NaN \n",
"\n",
" Unnamed: 6 Unnamed: 7 Unnamed: 8 \\\n",
"0 NaN NaN NaN \n",
"1 NaN NaN NaN \n",
"2 Derry City and Strabane Fermanagh and Omagh Lisburn and Castlereagh \n",
"3 226 138 219 \n",
"4 637 316 655 \n",
".. ... ... ... \n",
"85 485 325 869 \n",
"86 405 250 572 \n",
"87 1803 1181 3053 \n",
"88 NaN NaN NaN \n",
"89 NaN NaN NaN \n",
"\n",
" Unnamed: 9 Unnamed: 10 Unnamed: 11 \n",
"0 NaN NaN NaN \n",
"1 NaN NaN NaN \n",
"2 Mid and East Antrim Mid Ulster Newry, Mourne and Down \n",
"3 188 176 241 \n",
"4 618 428 505 \n",
".. ... ... ... \n",
"85 671 377 574 \n",
"86 474 359 437 \n",
"87 2346 1460 2000 \n",
"88 NaN NaN NaN \n",
"89 NaN NaN NaN \n",
"\n",
"[90 rows x 14 columns]"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[2,1]=c\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "4eac8523-a5cf-48c1-afb3-912c8c0e2cd5",
"metadata": {},
"outputs": [],
"source": [
"df=df[~df.iloc[:,0].str.contains('Total').fillna(False)]"
]
},
{
"cell_type": "markdown",
"id": "e1d84982-d88f-4ceb-b31d-b423b0ef90db",
"metadata": {},
"source": [
"df.iloc[1,2]=c"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "2313a66d-a591-421a-8003-dd0375582cdf",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" 2 | \n",
" Period | \n",
" Sale Year/Quarter | \n",
" Year | \n",
" Quarter | \n",
" Antrim and Newtownabbey | \n",
" Ards and North Down | \n",
" Armagh City, Banbridge and Craigavon | \n",
" Belfast | \n",
" Causeway Coast and Glens | \n",
" Derry City and Strabane | \n",
" Fermanagh and Omagh | \n",
" Lisburn and Castlereagh | \n",
" Mid and East Antrim | \n",
" Mid Ulster | \n",
" Newry, Mourne and Down | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2005Q1 | \n",
" Q1 2005 | \n",
" 2005 | \n",
" Q1 | \n",
" 236 | \n",
" 320 | \n",
" 333 | \n",
" 623 | \n",
" 236 | \n",
" 226 | \n",
" 138 | \n",
" 219 | \n",
" 188 | \n",
" 176 | \n",
" 241 | \n",
"
\n",
" \n",
" 1 | \n",
" 2005Q2 | \n",
" Q2 2005 | \n",
" 2005 | \n",
" Q2 | \n",
" 735 | \n",
" 857 | \n",
" 961 | \n",
" 1549 | \n",
" 712 | \n",
" 637 | \n",
" 316 | \n",
" 655 | \n",
" 618 | \n",
" 428 | \n",
" 505 | \n",
"
\n",
" \n",
" 2 | \n",
" 2005Q3 | \n",
" Q3 2005 | \n",
" 2005 | \n",
" Q3 | \n",
" 757 | \n",
" 960 | \n",
" 968 | \n",
" 1722 | \n",
" 714 | \n",
" 632 | \n",
" 365 | \n",
" 654 | \n",
" 686 | \n",
" 403 | \n",
" 582 | \n",
"
\n",
" \n",
" 3 | \n",
" 2005Q4 | \n",
" Q4 2005 | \n",
" 2005 | \n",
" Q4 | \n",
" 893 | \n",
" 995 | \n",
" 1199 | \n",
" 1943 | \n",
" 834 | \n",
" 746 | \n",
" 385 | \n",
" 670 | \n",
" 759 | \n",
" 489 | \n",
" 711 | \n",
"
\n",
" \n",
" 4 | \n",
" 2006Q1 | \n",
" Q1 2006 | \n",
" 2006 | \n",
" Q1 | \n",
" 761 | \n",
" 933 | \n",
" 1038 | \n",
" 1686 | \n",
" 763 | \n",
" 708 | \n",
" 348 | \n",
" 600 | \n",
" 668 | \n",
" 515 | \n",
" 567 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 63 | \n",
" 2020Q4 | \n",
" Q4 2020 | \n",
" 2020 | \n",
" Q4 | \n",
" 756 | \n",
" 1052 | \n",
" 974 | \n",
" 1565 | \n",
" 728 | \n",
" 496 | \n",
" 336 | \n",
" 830 | \n",
" 685 | \n",
" 419 | \n",
" 636 | \n",
"
\n",
" \n",
" 64 | \n",
" 2021Q1 | \n",
" Q1 2021 | \n",
" 2021 | \n",
" Q1 | \n",
" 652 | \n",
" 976 | \n",
" 849 | \n",
" 1497 | \n",
" 610 | \n",
" 466 | \n",
" 290 | \n",
" 762 | \n",
" 572 | \n",
" 349 | \n",
" 486 | \n",
"
\n",
" \n",
" 65 | \n",
" 2021Q2 | \n",
" Q2 2021 | \n",
" 2021 | \n",
" Q2 | \n",
" 724 | \n",
" 1018 | \n",
" 823 | \n",
" 1581 | \n",
" 675 | \n",
" 447 | \n",
" 316 | \n",
" 850 | \n",
" 629 | \n",
" 375 | \n",
" 503 | \n",
"
\n",
" \n",
" 66 | \n",
" 2021Q3 | \n",
" Q3 2021 | \n",
" 2021 | \n",
" Q3 | \n",
" 739 | \n",
" 989 | \n",
" 931 | \n",
" 1584 | \n",
" 625 | \n",
" 485 | \n",
" 325 | \n",
" 869 | \n",
" 671 | \n",
" 377 | \n",
" 574 | \n",
"
\n",
" \n",
" 67 | \n",
" 2021Q4 | \n",
" Q4 2021 | \n",
" 2021 | \n",
" Q4 | \n",
" 532 | \n",
" 702 | \n",
" 730 | \n",
" 1272 | \n",
" 417 | \n",
" 405 | \n",
" 250 | \n",
" 572 | \n",
" 474 | \n",
" 359 | \n",
" 437 | \n",
"
\n",
" \n",
"
\n",
"
68 rows × 15 columns
\n",
"
"
],
"text/plain": [
"2 Period Sale Year/Quarter Year Quarter Antrim and Newtownabbey \\\n",
"0 2005Q1 Q1 2005 2005 Q1 236 \n",
"1 2005Q2 Q2 2005 2005 Q2 735 \n",
"2 2005Q3 Q3 2005 2005 Q3 757 \n",
"3 2005Q4 Q4 2005 2005 Q4 893 \n",
"4 2006Q1 Q1 2006 2006 Q1 761 \n",
".. ... ... ... ... ... \n",
"63 2020Q4 Q4 2020 2020 Q4 756 \n",
"64 2021Q1 Q1 2021 2021 Q1 652 \n",
"65 2021Q2 Q2 2021 2021 Q2 724 \n",
"66 2021Q3 Q3 2021 2021 Q3 739 \n",
"67 2021Q4 Q4 2021 2021 Q4 532 \n",
"\n",
"2 Ards and North Down Armagh City, Banbridge and Craigavon Belfast \\\n",
"0 320 333 623 \n",
"1 857 961 1549 \n",
"2 960 968 1722 \n",
"3 995 1199 1943 \n",
"4 933 1038 1686 \n",
".. ... ... ... \n",
"63 1052 974 1565 \n",
"64 976 849 1497 \n",
"65 1018 823 1581 \n",
"66 989 931 1584 \n",
"67 702 730 1272 \n",
"\n",
"2 Causeway Coast and Glens Derry City and Strabane Fermanagh and Omagh \\\n",
"0 236 226 138 \n",
"1 712 637 316 \n",
"2 714 632 365 \n",
"3 834 746 385 \n",
"4 763 708 348 \n",
".. ... ... ... \n",
"63 728 496 336 \n",
"64 610 466 290 \n",
"65 675 447 316 \n",
"66 625 485 325 \n",
"67 417 405 250 \n",
"\n",
"2 Lisburn and Castlereagh Mid and East Antrim Mid Ulster \\\n",
"0 219 188 176 \n",
"1 655 618 428 \n",
"2 654 686 403 \n",
"3 670 759 489 \n",
"4 600 668 515 \n",
".. ... ... ... \n",
"63 830 685 419 \n",
"64 762 572 349 \n",
"65 850 629 375 \n",
"66 869 671 377 \n",
"67 572 474 359 \n",
"\n",
"2 Newry, Mourne and Down \n",
"0 241 \n",
"1 505 \n",
"2 582 \n",
"3 711 \n",
"4 567 \n",
".. ... \n",
"63 636 \n",
"64 486 \n",
"65 503 \n",
"66 574 \n",
"67 437 \n",
"\n",
"[68 rows x 15 columns]"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"basic_cleanup(df,offset=2)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "2ff3fa71-9114-4d27-b2db-f270c96203d2",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" 2 | \n",
" Period | \n",
" Year | \n",
" Quarter | \n",
" Antrim and Newtownabbey | \n",
" Ards and North Down | \n",
" Armagh City, Banbridge and Craigavon | \n",
" Belfast | \n",
" Causeway Coast and Glens | \n",
" Derry City and Strabane | \n",
" Fermanagh and Omagh | \n",
" Lisburn and Castlereagh | \n",
" Mid and East Antrim | \n",
" Mid Ulster | \n",
" Newry, Mourne and Down | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2005Q1 | \n",
" 2005 | \n",
" Q1 | \n",
" 236 | \n",
" 320 | \n",
" 333 | \n",
" 623 | \n",
" 236 | \n",
" 226 | \n",
" 138 | \n",
" 219 | \n",
" 188 | \n",
" 176 | \n",
" 241 | \n",
"
\n",
" \n",
" 1 | \n",
" 2005Q2 | \n",
" 2005 | \n",
" Q2 | \n",
" 735 | \n",
" 857 | \n",
" 961 | \n",
" 1549 | \n",
" 712 | \n",
" 637 | \n",
" 316 | \n",
" 655 | \n",
" 618 | \n",
" 428 | \n",
" 505 | \n",
"
\n",
" \n",
" 2 | \n",
" 2005Q3 | \n",
" 2005 | \n",
" Q3 | \n",
" 757 | \n",
" 960 | \n",
" 968 | \n",
" 1722 | \n",
" 714 | \n",
" 632 | \n",
" 365 | \n",
" 654 | \n",
" 686 | \n",
" 403 | \n",
" 582 | \n",
"
\n",
" \n",
" 3 | \n",
" 2005Q4 | \n",
" 2005 | \n",
" Q4 | \n",
" 893 | \n",
" 995 | \n",
" 1199 | \n",
" 1943 | \n",
" 834 | \n",
" 746 | \n",
" 385 | \n",
" 670 | \n",
" 759 | \n",
" 489 | \n",
" 711 | \n",
"
\n",
" \n",
" 4 | \n",
" 2006Q1 | \n",
" 2006 | \n",
" Q1 | \n",
" 761 | \n",
" 933 | \n",
" 1038 | \n",
" 1686 | \n",
" 763 | \n",
" 708 | \n",
" 348 | \n",
" 600 | \n",
" 668 | \n",
" 515 | \n",
" 567 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 63 | \n",
" 2020Q4 | \n",
" 2020 | \n",
" Q4 | \n",
" 756 | \n",
" 1052 | \n",
" 974 | \n",
" 1565 | \n",
" 728 | \n",
" 496 | \n",
" 336 | \n",
" 830 | \n",
" 685 | \n",
" 419 | \n",
" 636 | \n",
"
\n",
" \n",
" 64 | \n",
" 2021Q1 | \n",
" 2021 | \n",
" Q1 | \n",
" 652 | \n",
" 976 | \n",
" 849 | \n",
" 1497 | \n",
" 610 | \n",
" 466 | \n",
" 290 | \n",
" 762 | \n",
" 572 | \n",
" 349 | \n",
" 486 | \n",
"
\n",
" \n",
" 65 | \n",
" 2021Q2 | \n",
" 2021 | \n",
" Q2 | \n",
" 724 | \n",
" 1018 | \n",
" 823 | \n",
" 1581 | \n",
" 675 | \n",
" 447 | \n",
" 316 | \n",
" 850 | \n",
" 629 | \n",
" 375 | \n",
" 503 | \n",
"
\n",
" \n",
" 66 | \n",
" 2021Q3 | \n",
" 2021 | \n",
" Q3 | \n",
" 739 | \n",
" 989 | \n",
" 931 | \n",
" 1584 | \n",
" 625 | \n",
" 485 | \n",
" 325 | \n",
" 869 | \n",
" 671 | \n",
" 377 | \n",
" 574 | \n",
"
\n",
" \n",
" 67 | \n",
" 2021Q4 | \n",
" 2021 | \n",
" Q4 | \n",
" 532 | \n",
" 702 | \n",
" 730 | \n",
" 1272 | \n",
" 417 | \n",
" 405 | \n",
" 250 | \n",
" 572 | \n",
" 474 | \n",
" 359 | \n",
" 437 | \n",
"
\n",
" \n",
"
\n",
"
68 rows × 14 columns
\n",
"
"
],
"text/plain": [
"2 Period Year Quarter Antrim and Newtownabbey Ards and North Down \\\n",
"0 2005Q1 2005 Q1 236 320 \n",
"1 2005Q2 2005 Q2 735 857 \n",
"2 2005Q3 2005 Q3 757 960 \n",
"3 2005Q4 2005 Q4 893 995 \n",
"4 2006Q1 2006 Q1 761 933 \n",
".. ... ... ... ... ... \n",
"63 2020Q4 2020 Q4 756 1052 \n",
"64 2021Q1 2021 Q1 652 976 \n",
"65 2021Q2 2021 Q2 724 1018 \n",
"66 2021Q3 2021 Q3 739 989 \n",
"67 2021Q4 2021 Q4 532 702 \n",
"\n",
"2 Armagh City, Banbridge and Craigavon Belfast Causeway Coast and Glens \\\n",
"0 333 623 236 \n",
"1 961 1549 712 \n",
"2 968 1722 714 \n",
"3 1199 1943 834 \n",
"4 1038 1686 763 \n",
".. ... ... ... \n",
"63 974 1565 728 \n",
"64 849 1497 610 \n",
"65 823 1581 675 \n",
"66 931 1584 625 \n",
"67 730 1272 417 \n",
"\n",
"2 Derry City and Strabane Fermanagh and Omagh Lisburn and Castlereagh \\\n",
"0 226 138 219 \n",
"1 637 316 655 \n",
"2 632 365 654 \n",
"3 746 385 670 \n",
"4 708 348 600 \n",
".. ... ... ... \n",
"63 496 336 830 \n",
"64 466 290 762 \n",
"65 447 316 850 \n",
"66 485 325 869 \n",
"67 405 250 572 \n",
"\n",
"2 Mid and East Antrim Mid Ulster Newry, Mourne and Down \n",
"0 188 176 241 \n",
"1 618 428 505 \n",
"2 686 403 582 \n",
"3 759 489 711 \n",
"4 668 515 567 \n",
".. ... ... ... \n",
"63 685 419 636 \n",
"64 572 349 486 \n",
"65 629 375 503 \n",
"66 671 377 574 \n",
"67 474 359 437 \n",
"\n",
"[68 rows x 14 columns]"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def cleanup_table_5a(df):\n",
" \"\"\"\n",
" Table 5a: Number of Verified Residential Property Sales by Local Government District\n",
" * Parse the 'Sale Year/Quarter' to two separate cols\n",
" * Insert future-headers for Quarter and Year cols\n",
" * Remove rows with 'total' in the first column\n",
" * Disregard the 'Sale Year/Quarter' column\n",
" * perform `basic_cleanup` with offset=2\n",
" \"\"\"\n",
" # Safety first\n",
" df=df.copy()\n",
" \n",
" # Extract 'Quarter' and 'Year' columns from the future 'Sale Year/Quarter' column\n",
" dates = df.iloc[:,0].str.extract('(Q[1-4]) ([0-9]{4})').rename(columns={0:'Quarter',1:'Year'})\n",
" for c in ['Quarter','Year']:# insert the dates in order, so they come out in reverse in the insert\n",
" df.insert(1,c,dates[c])\n",
" df.iloc[2,1]=c # Need to have the right colname for when `basic_cleanup` is called.\n",
"\n",
" # Remove 'total' rows from the future 'Sale Year/Quarter' column\n",
" df=df[~df.iloc[:,0].str.contains('Total').fillna(False)]\n",
" \n",
" # Remove the 'Sale Year/Quarter' column all together\n",
" df = df.iloc[:,1:]\n",
" \n",
" # Standard cleanup\n",
" df = basic_cleanup(df, offset=2)\n",
" \n",
" return df\n",
"\n",
"cleanup_table_5a(source_df['Table 5a'])"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "234f1b68-3972-4615-8976-21585f838349",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(13, 19)"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dest_df['Table 5a']=cleanup_table_5a(source_df['Table 5a'])\n",
"len(dest_df), len([k for k in source_df.keys() if k.startswith('Table') and k not in dest_df])"
]
},
{
"cell_type": "markdown",
"id": "018d28d1-40ef-4c5b-ad45-6bf38d1c1c7c",
"metadata": {},
"source": [
"### Table 6: Standardised House Price & Index for all Urban and Rural areas in NI\n",
"\n",
"Wee buns, thankfully. Still mixing the 'HPI' vs 'Index', but that's a downstream problem "
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "71302516-377a-4255-97e0-14b670f13761",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" 1 | \n",
" Period | \n",
" Year | \n",
" Quarter | \n",
" Urban Areas HPI | \n",
" Urban Areas Standardised Price | \n",
" Rural Areas HPI | \n",
" Rural Areas Standardised Price | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2005Q1 | \n",
" 2005 | \n",
" Q1 | \n",
" 101.309947 | \n",
" 107723.320891 | \n",
" 100.109860 | \n",
" 124292.601178 | \n",
"
\n",
" \n",
" 1 | \n",
" 2005Q2 | \n",
" 2005 | \n",
" Q2 | \n",
" 104.402908 | \n",
" 111012.079786 | \n",
" 105.467951 | \n",
" 138865.721275 | \n",
"
\n",
" \n",
" 2 | \n",
" 2005Q3 | \n",
" 2005 | \n",
" Q3 | \n",
" 111.163485 | \n",
" 118200.631818 | \n",
" 111.847591 | \n",
" 138865.721275 | \n",
"
\n",
" \n",
" 3 | \n",
" 2005Q4 | \n",
" 2005 | \n",
" Q4 | \n",
" 114.871996 | \n",
" 122143.908606 | \n",
" 116.175119 | \n",
" 144238.615701 | \n",
"
\n",
" \n",
" 4 | \n",
" 2006Q1 | \n",
" 2006 | \n",
" Q1 | \n",
" 118.187559 | \n",
" 125669.361667 | \n",
" 119.329374 | \n",
" 148154.818847 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 63 | \n",
" 2020Q4 | \n",
" 2020 | \n",
" Q4 | \n",
" 132.610763 | \n",
" 141005.619094 | \n",
" 133.854953 | \n",
" 166189.226014 | \n",
"
\n",
" \n",
" 64 | \n",
" 2021Q1 | \n",
" 2021 | \n",
" Q1 | \n",
" 134.077654 | \n",
" 142565.370205 | \n",
" 135.267264 | \n",
" 167942.698911 | \n",
"
\n",
" \n",
" 65 | \n",
" 2021Q2 | \n",
" 2021 | \n",
" Q2 | \n",
" 138.575881 | \n",
" 147348.355880 | \n",
" 140.501443 | \n",
" 174441.256673 | \n",
"
\n",
" \n",
" 66 | \n",
" 2021Q3 | \n",
" 2021 | \n",
" Q3 | \n",
" 142.840470 | \n",
" 151882.912133 | \n",
" 144.695321 | \n",
" 179648.216283 | \n",
"
\n",
" \n",
" 67 | \n",
" 2021Q4 | \n",
" 2021 | \n",
" Q4 | \n",
" 142.375033 | \n",
" 151388.010443 | \n",
" 146.115278 | \n",
" 181411.180623 | \n",
"
\n",
" \n",
"
\n",
"
68 rows × 7 columns
\n",
"
"
],
"text/plain": [
"1 Period Year Quarter Urban Areas HPI Urban Areas Standardised Price \\\n",
"0 2005Q1 2005 Q1 101.309947 107723.320891 \n",
"1 2005Q2 2005 Q2 104.402908 111012.079786 \n",
"2 2005Q3 2005 Q3 111.163485 118200.631818 \n",
"3 2005Q4 2005 Q4 114.871996 122143.908606 \n",
"4 2006Q1 2006 Q1 118.187559 125669.361667 \n",
".. ... ... ... ... ... \n",
"63 2020Q4 2020 Q4 132.610763 141005.619094 \n",
"64 2021Q1 2021 Q1 134.077654 142565.370205 \n",
"65 2021Q2 2021 Q2 138.575881 147348.355880 \n",
"66 2021Q3 2021 Q3 142.840470 151882.912133 \n",
"67 2021Q4 2021 Q4 142.375033 151388.010443 \n",
"\n",
"1 Rural Areas HPI Rural Areas Standardised Price \n",
"0 100.109860 124292.601178 \n",
"1 105.467951 138865.721275 \n",
"2 111.847591 138865.721275 \n",
"3 116.175119 144238.615701 \n",
"4 119.329374 148154.818847 \n",
".. ... ... \n",
"63 133.854953 166189.226014 \n",
"64 135.267264 167942.698911 \n",
"65 140.501443 174441.256673 \n",
"66 144.695321 179648.216283 \n",
"67 146.115278 181411.180623 \n",
"\n",
"[68 rows x 7 columns]"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = basic_cleanup(source_df['Table 6'])\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "98b33bcb-d20f-492e-8828-5d84eba1a04d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(14, 18)"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dest_df['Table 6']=basic_cleanup(source_df['Table 6'])\n",
"len(dest_df), len([k for k in source_df.keys() if k.startswith('Table') and k not in dest_df])"
]
},
{
"cell_type": "markdown",
"id": "f47e5018-6eaa-404a-bbbd-5a861155519d",
"metadata": {},
"source": [
"### Table 7: Standardised House Price & Index for Rural Areas of Northern Ireland by drive times\n",
"\n",
"Nearly-wee-buns; but this one doesn't have Year or Quarter headers, and the extra `\\n (Ref: Q1 2015)` added, which will complicate downstream analysis if that changes over time..."
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "d3c9abab-7008-4d53-a94c-9429084a9340",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Table 7: Standardised House Price & Index for Rural Areas of Northern Ireland by drive times | \n",
" Unnamed: 1 | \n",
" Unnamed: 2 | \n",
" Unnamed: 3 | \n",
" Unnamed: 4 | \n",
" Unnamed: 5 | \n",
" Unnamed: 6 | \n",
" Unnamed: 7 | \n",
" Unnamed: 8 | \n",
" Unnamed: 9 | \n",
" Unnamed: 10 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Back to contents | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" NaN | \n",
" NaN | \n",
" Drive time within 20mins of town of 10,000 or ... | \n",
" Drive time within 20mins of town of 10,000 or ... | \n",
" Drive time outside 20mins of town of 10,000 or... | \n",
" Drive time outside 20mins of town of 10,000 or... | \n",
" Drive time within 1hr of Belfast Index | \n",
" Drive time within 1hr of Belfast Price\\n(Ref:... | \n",
" Drive time outside 1hr of Belfast Index | \n",
" Drive time outside 1hr of Belfast Price\\n(Ref... | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" 2015 | \n",
" Q1 | \n",
" 100 | \n",
" 124898.676844 | \n",
" 100 | \n",
" 122528.427865 | \n",
" 100 | \n",
" 128955.274996 | \n",
" 100 | \n",
" 111866.40498 | \n",
" 100.0 | \n",
"
\n",
" \n",
" 3 | \n",
" NaN | \n",
" Q2 | \n",
" 103.166882 | \n",
" 128854.070701 | \n",
" 103.003978 | \n",
" 126209.155363 | \n",
" 103.025069 | \n",
" 132856.260679 | \n",
" 103.349406 | \n",
" 115613.265107 | \n",
" 100.0 | \n",
"
\n",
" \n",
" 4 | \n",
" NaN | \n",
" Q3 | \n",
" 105.851629 | \n",
" 132207.28391 | \n",
" 105.619893 | \n",
" 129414.394046 | \n",
" 105.031061 | \n",
" 135443.093443 | \n",
" 107.811831 | \n",
" 120605.219276 | \n",
" 100.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Table 7: Standardised House Price & Index for Rural Areas of Northern Ireland by drive times \\\n",
"0 Back to contents \n",
"1 NaN \n",
"2 2015 \n",
"3 NaN \n",
"4 NaN \n",
"\n",
" Unnamed: 1 Unnamed: 2 \\\n",
"0 NaN NaN \n",
"1 NaN Drive time within 20mins of town of 10,000 or ... \n",
"2 Q1 100 \n",
"3 Q2 103.166882 \n",
"4 Q3 105.851629 \n",
"\n",
" Unnamed: 3 \\\n",
"0 NaN \n",
"1 Drive time within 20mins of town of 10,000 or ... \n",
"2 124898.676844 \n",
"3 128854.070701 \n",
"4 132207.28391 \n",
"\n",
" Unnamed: 4 \\\n",
"0 NaN \n",
"1 Drive time outside 20mins of town of 10,000 or... \n",
"2 100 \n",
"3 103.003978 \n",
"4 105.619893 \n",
"\n",
" Unnamed: 5 \\\n",
"0 NaN \n",
"1 Drive time outside 20mins of town of 10,000 or... \n",
"2 122528.427865 \n",
"3 126209.155363 \n",
"4 129414.394046 \n",
"\n",
" Unnamed: 6 \\\n",
"0 NaN \n",
"1 Drive time within 1hr of Belfast Index \n",
"2 100 \n",
"3 103.025069 \n",
"4 105.031061 \n",
"\n",
" Unnamed: 7 \\\n",
"0 NaN \n",
"1 Drive time within 1hr of Belfast Price\\n(Ref:... \n",
"2 128955.274996 \n",
"3 132856.260679 \n",
"4 135443.093443 \n",
"\n",
" Unnamed: 8 \\\n",
"0 NaN \n",
"1 Drive time outside 1hr of Belfast Index \n",
"2 100 \n",
"3 103.349406 \n",
"4 107.811831 \n",
"\n",
" Unnamed: 9 Unnamed: 10 \n",
"0 NaN NaN \n",
"1 Drive time outside 1hr of Belfast Price\\n(Ref... NaN \n",
"2 111866.40498 100.0 \n",
"3 115613.265107 100.0 \n",
"4 120605.219276 100.0 "
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = source_df['Table 7'].copy()\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "77b3455b-eb04-490f-b7bf-943abf98416e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Table 7: Standardised House Price & Index for Rural Areas of Northern Ireland by drive times | \n",
" Unnamed: 1 | \n",
" Unnamed: 2 | \n",
" Unnamed: 3 | \n",
" Unnamed: 4 | \n",
" Unnamed: 5 | \n",
" Unnamed: 6 | \n",
" Unnamed: 7 | \n",
" Unnamed: 8 | \n",
" Unnamed: 9 | \n",
" Unnamed: 10 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Back to contents | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" Year | \n",
" Quarter | \n",
" Drive time within 20mins of town of 10,000 or ... | \n",
" Drive time within 20mins of town of 10,000 or ... | \n",
" Drive time outside 20mins of town of 10,000 or... | \n",
" Drive time outside 20mins of town of 10,000 or... | \n",
" Drive time within 1hr of Belfast Index | \n",
" Drive time within 1hr of Belfast Price\\n(Ref:... | \n",
" Drive time outside 1hr of Belfast Index | \n",
" Drive time outside 1hr of Belfast Price\\n(Ref... | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" 2015 | \n",
" Q1 | \n",
" 100 | \n",
" 124898.676844 | \n",
" 100 | \n",
" 122528.427865 | \n",
" 100 | \n",
" 128955.274996 | \n",
" 100 | \n",
" 111866.40498 | \n",
" 100.0 | \n",
"
\n",
" \n",
" 3 | \n",
" NaN | \n",
" Q2 | \n",
" 103.166882 | \n",
" 128854.070701 | \n",
" 103.003978 | \n",
" 126209.155363 | \n",
" 103.025069 | \n",
" 132856.260679 | \n",
" 103.349406 | \n",
" 115613.265107 | \n",
" 100.0 | \n",
"
\n",
" \n",
" 4 | \n",
" NaN | \n",
" Q3 | \n",
" 105.851629 | \n",
" 132207.28391 | \n",
" 105.619893 | \n",
" 129414.394046 | \n",
" 105.031061 | \n",
" 135443.093443 | \n",
" 107.811831 | \n",
" 120605.219276 | \n",
" 100.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Table 7: Standardised House Price & Index for Rural Areas of Northern Ireland by drive times \\\n",
"0 Back to contents \n",
"1 Year \n",
"2 2015 \n",
"3 NaN \n",
"4 NaN \n",
"\n",
" Unnamed: 1 Unnamed: 2 \\\n",
"0 NaN NaN \n",
"1 Quarter Drive time within 20mins of town of 10,000 or ... \n",
"2 Q1 100 \n",
"3 Q2 103.166882 \n",
"4 Q3 105.851629 \n",
"\n",
" Unnamed: 3 \\\n",
"0 NaN \n",
"1 Drive time within 20mins of town of 10,000 or ... \n",
"2 124898.676844 \n",
"3 128854.070701 \n",
"4 132207.28391 \n",
"\n",
" Unnamed: 4 \\\n",
"0 NaN \n",
"1 Drive time outside 20mins of town of 10,000 or... \n",
"2 100 \n",
"3 103.003978 \n",
"4 105.619893 \n",
"\n",
" Unnamed: 5 \\\n",
"0 NaN \n",
"1 Drive time outside 20mins of town of 10,000 or... \n",
"2 122528.427865 \n",
"3 126209.155363 \n",
"4 129414.394046 \n",
"\n",
" Unnamed: 6 \\\n",
"0 NaN \n",
"1 Drive time within 1hr of Belfast Index \n",
"2 100 \n",
"3 103.025069 \n",
"4 105.031061 \n",
"\n",
" Unnamed: 7 \\\n",
"0 NaN \n",
"1 Drive time within 1hr of Belfast Price\\n(Ref:... \n",
"2 128955.274996 \n",
"3 132856.260679 \n",
"4 135443.093443 \n",
"\n",
" Unnamed: 8 \\\n",
"0 NaN \n",
"1 Drive time outside 1hr of Belfast Index \n",
"2 100 \n",
"3 103.349406 \n",
"4 107.811831 \n",
"\n",
" Unnamed: 9 Unnamed: 10 \n",
"0 NaN NaN \n",
"1 Drive time outside 1hr of Belfast Price\\n(Ref... NaN \n",
"2 111866.40498 100.0 \n",
"3 115613.265107 100.0 \n",
"4 120605.219276 100.0 "
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[1,0] = 'Year'\n",
"df.iloc[1,1] = 'Quarter'\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "82cbc57e-9e5d-4696-bfc3-ddd61e845ca4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" 1 | \n",
" Period | \n",
" Year | \n",
" Quarter | \n",
" Drive time within 20mins of town of 10,000 or more Index | \n",
" Drive time within 20mins of town of 10,000 or more Price\\n(Ref: Q1 2015) | \n",
" Drive time outside 20mins of town of 10,000 or more Index | \n",
" Drive time outside 20mins of town of 10,000 or more Price\\n(Ref: Q1 2015) | \n",
" Drive time within 1hr of Belfast Index | \n",
" Drive time within 1hr of Belfast Price\\n(Ref: Q1 2015) | \n",
" Drive time outside 1hr of Belfast Index | \n",
" Drive time outside 1hr of Belfast Price\\n(Ref: Q1 2015) | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2015Q1 | \n",
" 2015 | \n",
" Q1 | \n",
" 100.000000 | \n",
" 124898.676844 | \n",
" 100.000000 | \n",
" 122528.427865 | \n",
" 100.000000 | \n",
" 128955.274996 | \n",
" 100.000000 | \n",
" 111866.404980 | \n",
"
\n",
" \n",
" 1 | \n",
" 2015Q2 | \n",
" 2015 | \n",
" Q2 | \n",
" 103.166882 | \n",
" 128854.070701 | \n",
" 103.003978 | \n",
" 126209.155363 | \n",
" 103.025069 | \n",
" 132856.260679 | \n",
" 103.349406 | \n",
" 115613.265107 | \n",
"
\n",
" \n",
" 2 | \n",
" 2015Q3 | \n",
" 2015 | \n",
" Q3 | \n",
" 105.851629 | \n",
" 132207.283910 | \n",
" 105.619893 | \n",
" 129414.394046 | \n",
" 105.031061 | \n",
" 135443.093443 | \n",
" 107.811831 | \n",
" 120605.219276 | \n",
"
\n",
" \n",
" 3 | \n",
" 2015Q4 | \n",
" 2015 | \n",
" Q4 | \n",
" 107.430656 | \n",
" 134179.467306 | \n",
" 106.924715 | \n",
" 131013.172436 | \n",
" 106.240145 | \n",
" 137002.270924 | \n",
" 110.075053 | \n",
" 123137.004353 | \n",
"
\n",
" \n",
" 4 | \n",
" 2016Q1 | \n",
" 2016 | \n",
" Q1 | \n",
" 108.909364 | \n",
" 136026.354775 | \n",
" 108.368772 | \n",
" 132782.552750 | \n",
" 107.604887 | \n",
" 138762.178070 | \n",
" 111.828874 | \n",
" 125098.941485 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"1 Period Year Quarter \\\n",
"0 2015Q1 2015 Q1 \n",
"1 2015Q2 2015 Q2 \n",
"2 2015Q3 2015 Q3 \n",
"3 2015Q4 2015 Q4 \n",
"4 2016Q1 2016 Q1 \n",
"\n",
"1 Drive time within 20mins of town of 10,000 or more Index \\\n",
"0 100.000000 \n",
"1 103.166882 \n",
"2 105.851629 \n",
"3 107.430656 \n",
"4 108.909364 \n",
"\n",
"1 Drive time within 20mins of town of 10,000 or more Price\\n(Ref: Q1 2015) \\\n",
"0 124898.676844 \n",
"1 128854.070701 \n",
"2 132207.283910 \n",
"3 134179.467306 \n",
"4 136026.354775 \n",
"\n",
"1 Drive time outside 20mins of town of 10,000 or more Index \\\n",
"0 100.000000 \n",
"1 103.003978 \n",
"2 105.619893 \n",
"3 106.924715 \n",
"4 108.368772 \n",
"\n",
"1 Drive time outside 20mins of town of 10,000 or more Price\\n(Ref: Q1 2015) \\\n",
"0 122528.427865 \n",
"1 126209.155363 \n",
"2 129414.394046 \n",
"3 131013.172436 \n",
"4 132782.552750 \n",
"\n",
"1 Drive time within 1hr of Belfast Index \\\n",
"0 100.000000 \n",
"1 103.025069 \n",
"2 105.031061 \n",
"3 106.240145 \n",
"4 107.604887 \n",
"\n",
"1 Drive time within 1hr of Belfast Price\\n(Ref: Q1 2015) \\\n",
"0 128955.274996 \n",
"1 132856.260679 \n",
"2 135443.093443 \n",
"3 137002.270924 \n",
"4 138762.178070 \n",
"\n",
"1 Drive time outside 1hr of Belfast Index \\\n",
"0 100.000000 \n",
"1 103.349406 \n",
"2 107.811831 \n",
"3 110.075053 \n",
"4 111.828874 \n",
"\n",
"1 Drive time outside 1hr of Belfast Price\\n(Ref: Q1 2015) \n",
"0 111866.404980 \n",
"1 115613.265107 \n",
"2 120605.219276 \n",
"3 123137.004353 \n",
"4 125098.941485 "
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"basic_cleanup(df).head()"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "2fb4f831-892c-4026-96b1-d914c98e22a0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" 1 | \n",
" Period | \n",
" Year | \n",
" Quarter | \n",
" Drive time within 20mins of town of 10,000 or more Index | \n",
" Drive time within 20mins of town of 10,000 or more Price\\n(Ref: Q1 2015) | \n",
" Drive time outside 20mins of town of 10,000 or more Index | \n",
" Drive time outside 20mins of town of 10,000 or more Price\\n(Ref: Q1 2015) | \n",
" Drive time within 1hr of Belfast Index | \n",
" Drive time within 1hr of Belfast Price\\n(Ref: Q1 2015) | \n",
" Drive time outside 1hr of Belfast Index | \n",
" Drive time outside 1hr of Belfast Price\\n(Ref: Q1 2015) | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2015Q1 | \n",
" 2015 | \n",
" Q1 | \n",
" 100.000000 | \n",
" 124898.676844 | \n",
" 100.000000 | \n",
" 122528.427865 | \n",
" 100.000000 | \n",
" 128955.274996 | \n",
" 100.000000 | \n",
" 111866.404980 | \n",
"
\n",
" \n",
" 1 | \n",
" 2015Q2 | \n",
" 2015 | \n",
" Q2 | \n",
" 103.166882 | \n",
" 128854.070701 | \n",
" 103.003978 | \n",
" 126209.155363 | \n",
" 103.025069 | \n",
" 132856.260679 | \n",
" 103.349406 | \n",
" 115613.265107 | \n",
"
\n",
" \n",
" 2 | \n",
" 2015Q3 | \n",
" 2015 | \n",
" Q3 | \n",
" 105.851629 | \n",
" 132207.283910 | \n",
" 105.619893 | \n",
" 129414.394046 | \n",
" 105.031061 | \n",
" 135443.093443 | \n",
" 107.811831 | \n",
" 120605.219276 | \n",
"
\n",
" \n",
" 3 | \n",
" 2015Q4 | \n",
" 2015 | \n",
" Q4 | \n",
" 107.430656 | \n",
" 134179.467306 | \n",
" 106.924715 | \n",
" 131013.172436 | \n",
" 106.240145 | \n",
" 137002.270924 | \n",
" 110.075053 | \n",
" 123137.004353 | \n",
"
\n",
" \n",
" 4 | \n",
" 2016Q1 | \n",
" 2016 | \n",
" Q1 | \n",
" 108.909364 | \n",
" 136026.354775 | \n",
" 108.368772 | \n",
" 132782.552750 | \n",
" 107.604887 | \n",
" 138762.178070 | \n",
" 111.828874 | \n",
" 125098.941485 | \n",
"
\n",
" \n",
" 5 | \n",
" 2016Q2 | \n",
" 2016 | \n",
" Q2 | \n",
" 111.263396 | \n",
" 138966.509219 | \n",
" 109.739250 | \n",
" 134461.778232 | \n",
" 110.208116 | \n",
" 142119.179594 | \n",
" 111.991819 | \n",
" 125281.221326 | \n",
"
\n",
" \n",
" 6 | \n",
" 2016Q3 | \n",
" 2016 | \n",
" Q3 | \n",
" 113.419541 | \n",
" 141659.506269 | \n",
" 112.426034 | \n",
" 137753.851946 | \n",
" 112.202571 | \n",
" 144691.133778 | \n",
" 115.398242 | \n",
" 129091.864904 | \n",
"
\n",
" \n",
" 7 | \n",
" 2016Q4 | \n",
" 2016 | \n",
" Q4 | \n",
" 113.928074 | \n",
" 142294.657346 | \n",
" 113.219995 | \n",
" 138726.680412 | \n",
" 112.508660 | \n",
" 145085.851672 | \n",
" 116.952798 | \n",
" 130830.890712 | \n",
"
\n",
" \n",
" 8 | \n",
" 2017Q1 | \n",
" 2017 | \n",
" Q1 | \n",
" 114.262386 | \n",
" 142712.207695 | \n",
" 113.549623 | \n",
" 139130.567598 | \n",
" 112.823330 | \n",
" 145491.635911 | \n",
" 117.341538 | \n",
" 131265.759778 | \n",
"
\n",
" \n",
" 9 | \n",
" 2017Q2 | \n",
" 2017 | \n",
" Q2 | \n",
" 115.566592 | \n",
" 144341.144812 | \n",
" 115.829688 | \n",
" 141924.295411 | \n",
" 114.288862 | \n",
" 147381.515712 | \n",
" 119.397027 | \n",
" 133565.161466 | \n",
"
\n",
" \n",
" 10 | \n",
" 2017Q3 | \n",
" 2017 | \n",
" Q3 | \n",
" 116.716428 | \n",
" 145777.273752 | \n",
" 117.061832 | \n",
" 143434.022704 | \n",
" 115.192251 | \n",
" 148546.484147 | \n",
" 121.273582 | \n",
" 135664.395891 | \n",
"
\n",
" \n",
" 11 | \n",
" 2017Q4 | \n",
" 2017 | \n",
" Q4 | \n",
" 117.925340 | \n",
" 147287.189812 | \n",
" 118.541541 | \n",
" 145247.086931 | \n",
" 116.101165 | \n",
" 149718.576638 | \n",
" 123.577530 | \n",
" 138241.740123 | \n",
"
\n",
" \n",
" 12 | \n",
" 2018Q1 | \n",
" 2018 | \n",
" Q1 | \n",
" 118.482802 | \n",
" 147983.452250 | \n",
" 120.184585 | \n",
" 147260.282195 | \n",
" 117.972698 | \n",
" 152132.017714 | \n",
" 122.075418 | \n",
" 136561.381608 | \n",
"
\n",
" \n",
" 13 | \n",
" 2018Q2 | \n",
" 2018 | \n",
" Q2 | \n",
" 119.443631 | \n",
" 149183.514842 | \n",
" 120.710551 | \n",
" 147904.740501 | \n",
" 117.686726 | \n",
" 151763.241043 | \n",
" 125.603705 | \n",
" 140508.349303 | \n",
"
\n",
" \n",
" 14 | \n",
" 2018Q3 | \n",
" 2018 | \n",
" Q3 | \n",
" 121.408923 | \n",
" 151638.138779 | \n",
" 122.222994 | \n",
" 149757.912837 | \n",
" 119.614730 | \n",
" 154249.503782 | \n",
" 127.210974 | \n",
" 142306.342946 | \n",
"
\n",
" \n",
" 15 | \n",
" 2018Q4 | \n",
" 2018 | \n",
" Q4 | \n",
" 123.531419 | \n",
" 154289.108214 | \n",
" 125.254013 | \n",
" 153471.772944 | \n",
" 121.462721 | \n",
" 156632.586327 | \n",
" 130.953652 | \n",
" 146493.143086 | \n",
"
\n",
" \n",
" 16 | \n",
" 2019Q1 | \n",
" 2019 | \n",
" Q1 | \n",
" 122.499375 | \n",
" 153000.098716 | \n",
" 123.207618 | \n",
" 150964.357154 | \n",
" 120.087144 | \n",
" 154858.706540 | \n",
" 129.712835 | \n",
" 145105.085124 | \n",
"
\n",
" \n",
" 17 | \n",
" 2019Q2 | \n",
" 2019 | \n",
" Q2 | \n",
" 124.397722 | \n",
" 155371.109292 | \n",
" 125.151589 | \n",
" 153346.274193 | \n",
" 122.486275 | \n",
" 157952.513361 | \n",
" 130.480634 | \n",
" 145963.994647 | \n",
"
\n",
" \n",
" 18 | \n",
" 2019Q3 | \n",
" 2019 | \n",
" Q3 | \n",
" 126.533407 | \n",
" 158038.551430 | \n",
" 128.647747 | \n",
" 157630.061642 | \n",
" 124.978137 | \n",
" 161165.900455 | \n",
" 133.258945 | \n",
" 149071.990904 | \n",
"
\n",
" \n",
" 19 | \n",
" 2019Q4 | \n",
" 2019 | \n",
" Q4 | \n",
" 127.126748 | \n",
" 158779.626458 | \n",
" 127.784267 | \n",
" 156572.053236 | \n",
" 124.547746 | \n",
" 160610.887802 | \n",
" 134.709059 | \n",
" 150694.181735 | \n",
"
\n",
" \n",
" 20 | \n",
" 2020Q1 | \n",
" 2020 | \n",
" Q1 | \n",
" 127.090324 | \n",
" 158734.133127 | \n",
" 128.619521 | \n",
" 157595.476721 | \n",
" 124.744271 | \n",
" 160864.317472 | \n",
" 135.068006 | \n",
" 151095.722593 | \n",
"
\n",
" \n",
" 21 | \n",
" 2020Q2 | \n",
" 2020 | \n",
" Q2 | \n",
" 127.200617 | \n",
" 158871.887068 | \n",
" 127.231209 | \n",
" 155894.400318 | \n",
" 125.021931 | \n",
" 161222.374474 | \n",
" 132.935126 | \n",
" 148709.746207 | \n",
"
\n",
" \n",
" 22 | \n",
" 2020Q3 | \n",
" 2020 | \n",
" Q3 | \n",
" 129.627870 | \n",
" 161903.493901 | \n",
" 131.083667 | \n",
" 160614.756005 | \n",
" 127.873046 | \n",
" 164899.037745 | \n",
" 135.952621 | \n",
" 152085.309777 | \n",
"
\n",
" \n",
" 23 | \n",
" 2020Q4 | \n",
" 2020 | \n",
" Q4 | \n",
" 132.853240 | \n",
" 165931.938294 | \n",
" 135.873737 | \n",
" 166483.954356 | \n",
" 130.725554 | \n",
" 168577.497418 | \n",
" 142.032974 | \n",
" 158887.182045 | \n",
"
\n",
" \n",
" 24 | \n",
" 2021Q1 | \n",
" 2021 | \n",
" Q1 | \n",
" 134.414458 | \n",
" 167881.879606 | \n",
" 136.978231 | \n",
" 167837.272586 | \n",
" 132.479831 | \n",
" 170839.730568 | \n",
" 142.532053 | \n",
" 159445.483290 | \n",
"
\n",
" \n",
" 25 | \n",
" 2021Q2 | \n",
" 2021 | \n",
" Q2 | \n",
" 139.417605 | \n",
" 174130.744141 | \n",
" 142.727536 | \n",
" 174881.805508 | \n",
" 137.737680 | \n",
" 177620.004609 | \n",
" 147.652942 | \n",
" 165174.037638 | \n",
"
\n",
" \n",
" 26 | \n",
" 2021Q3 | \n",
" 2021 | \n",
" Q3 | \n",
" 143.303934 | \n",
" 178984.717757 | \n",
" 147.615067 | \n",
" 180870.420630 | \n",
" 141.482937 | \n",
" 182449.710341 | \n",
" 153.161532 | \n",
" 171336.300173 | \n",
"
\n",
" \n",
" 27 | \n",
" 2021Q4 | \n",
" 2021 | \n",
" Q4 | \n",
" 144.731984 | \n",
" 180768.332630 | \n",
" 149.013971 | \n",
" 182584.475980 | \n",
" 142.365385 | \n",
" 183587.673223 | \n",
" 156.204293 | \n",
" 174740.127539 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"1 Period Year Quarter \\\n",
"0 2015Q1 2015 Q1 \n",
"1 2015Q2 2015 Q2 \n",
"2 2015Q3 2015 Q3 \n",
"3 2015Q4 2015 Q4 \n",
"4 2016Q1 2016 Q1 \n",
"5 2016Q2 2016 Q2 \n",
"6 2016Q3 2016 Q3 \n",
"7 2016Q4 2016 Q4 \n",
"8 2017Q1 2017 Q1 \n",
"9 2017Q2 2017 Q2 \n",
"10 2017Q3 2017 Q3 \n",
"11 2017Q4 2017 Q4 \n",
"12 2018Q1 2018 Q1 \n",
"13 2018Q2 2018 Q2 \n",
"14 2018Q3 2018 Q3 \n",
"15 2018Q4 2018 Q4 \n",
"16 2019Q1 2019 Q1 \n",
"17 2019Q2 2019 Q2 \n",
"18 2019Q3 2019 Q3 \n",
"19 2019Q4 2019 Q4 \n",
"20 2020Q1 2020 Q1 \n",
"21 2020Q2 2020 Q2 \n",
"22 2020Q3 2020 Q3 \n",
"23 2020Q4 2020 Q4 \n",
"24 2021Q1 2021 Q1 \n",
"25 2021Q2 2021 Q2 \n",
"26 2021Q3 2021 Q3 \n",
"27 2021Q4 2021 Q4 \n",
"\n",
"1 Drive time within 20mins of town of 10,000 or more Index \\\n",
"0 100.000000 \n",
"1 103.166882 \n",
"2 105.851629 \n",
"3 107.430656 \n",
"4 108.909364 \n",
"5 111.263396 \n",
"6 113.419541 \n",
"7 113.928074 \n",
"8 114.262386 \n",
"9 115.566592 \n",
"10 116.716428 \n",
"11 117.925340 \n",
"12 118.482802 \n",
"13 119.443631 \n",
"14 121.408923 \n",
"15 123.531419 \n",
"16 122.499375 \n",
"17 124.397722 \n",
"18 126.533407 \n",
"19 127.126748 \n",
"20 127.090324 \n",
"21 127.200617 \n",
"22 129.627870 \n",
"23 132.853240 \n",
"24 134.414458 \n",
"25 139.417605 \n",
"26 143.303934 \n",
"27 144.731984 \n",
"\n",
"1 Drive time within 20mins of town of 10,000 or more Price\\n(Ref: Q1 2015) \\\n",
"0 124898.676844 \n",
"1 128854.070701 \n",
"2 132207.283910 \n",
"3 134179.467306 \n",
"4 136026.354775 \n",
"5 138966.509219 \n",
"6 141659.506269 \n",
"7 142294.657346 \n",
"8 142712.207695 \n",
"9 144341.144812 \n",
"10 145777.273752 \n",
"11 147287.189812 \n",
"12 147983.452250 \n",
"13 149183.514842 \n",
"14 151638.138779 \n",
"15 154289.108214 \n",
"16 153000.098716 \n",
"17 155371.109292 \n",
"18 158038.551430 \n",
"19 158779.626458 \n",
"20 158734.133127 \n",
"21 158871.887068 \n",
"22 161903.493901 \n",
"23 165931.938294 \n",
"24 167881.879606 \n",
"25 174130.744141 \n",
"26 178984.717757 \n",
"27 180768.332630 \n",
"\n",
"1 Drive time outside 20mins of town of 10,000 or more Index \\\n",
"0 100.000000 \n",
"1 103.003978 \n",
"2 105.619893 \n",
"3 106.924715 \n",
"4 108.368772 \n",
"5 109.739250 \n",
"6 112.426034 \n",
"7 113.219995 \n",
"8 113.549623 \n",
"9 115.829688 \n",
"10 117.061832 \n",
"11 118.541541 \n",
"12 120.184585 \n",
"13 120.710551 \n",
"14 122.222994 \n",
"15 125.254013 \n",
"16 123.207618 \n",
"17 125.151589 \n",
"18 128.647747 \n",
"19 127.784267 \n",
"20 128.619521 \n",
"21 127.231209 \n",
"22 131.083667 \n",
"23 135.873737 \n",
"24 136.978231 \n",
"25 142.727536 \n",
"26 147.615067 \n",
"27 149.013971 \n",
"\n",
"1 Drive time outside 20mins of town of 10,000 or more Price\\n(Ref: Q1 2015) \\\n",
"0 122528.427865 \n",
"1 126209.155363 \n",
"2 129414.394046 \n",
"3 131013.172436 \n",
"4 132782.552750 \n",
"5 134461.778232 \n",
"6 137753.851946 \n",
"7 138726.680412 \n",
"8 139130.567598 \n",
"9 141924.295411 \n",
"10 143434.022704 \n",
"11 145247.086931 \n",
"12 147260.282195 \n",
"13 147904.740501 \n",
"14 149757.912837 \n",
"15 153471.772944 \n",
"16 150964.357154 \n",
"17 153346.274193 \n",
"18 157630.061642 \n",
"19 156572.053236 \n",
"20 157595.476721 \n",
"21 155894.400318 \n",
"22 160614.756005 \n",
"23 166483.954356 \n",
"24 167837.272586 \n",
"25 174881.805508 \n",
"26 180870.420630 \n",
"27 182584.475980 \n",
"\n",
"1 Drive time within 1hr of Belfast Index \\\n",
"0 100.000000 \n",
"1 103.025069 \n",
"2 105.031061 \n",
"3 106.240145 \n",
"4 107.604887 \n",
"5 110.208116 \n",
"6 112.202571 \n",
"7 112.508660 \n",
"8 112.823330 \n",
"9 114.288862 \n",
"10 115.192251 \n",
"11 116.101165 \n",
"12 117.972698 \n",
"13 117.686726 \n",
"14 119.614730 \n",
"15 121.462721 \n",
"16 120.087144 \n",
"17 122.486275 \n",
"18 124.978137 \n",
"19 124.547746 \n",
"20 124.744271 \n",
"21 125.021931 \n",
"22 127.873046 \n",
"23 130.725554 \n",
"24 132.479831 \n",
"25 137.737680 \n",
"26 141.482937 \n",
"27 142.365385 \n",
"\n",
"1 Drive time within 1hr of Belfast Price\\n(Ref: Q1 2015) \\\n",
"0 128955.274996 \n",
"1 132856.260679 \n",
"2 135443.093443 \n",
"3 137002.270924 \n",
"4 138762.178070 \n",
"5 142119.179594 \n",
"6 144691.133778 \n",
"7 145085.851672 \n",
"8 145491.635911 \n",
"9 147381.515712 \n",
"10 148546.484147 \n",
"11 149718.576638 \n",
"12 152132.017714 \n",
"13 151763.241043 \n",
"14 154249.503782 \n",
"15 156632.586327 \n",
"16 154858.706540 \n",
"17 157952.513361 \n",
"18 161165.900455 \n",
"19 160610.887802 \n",
"20 160864.317472 \n",
"21 161222.374474 \n",
"22 164899.037745 \n",
"23 168577.497418 \n",
"24 170839.730568 \n",
"25 177620.004609 \n",
"26 182449.710341 \n",
"27 183587.673223 \n",
"\n",
"1 Drive time outside 1hr of Belfast Index \\\n",
"0 100.000000 \n",
"1 103.349406 \n",
"2 107.811831 \n",
"3 110.075053 \n",
"4 111.828874 \n",
"5 111.991819 \n",
"6 115.398242 \n",
"7 116.952798 \n",
"8 117.341538 \n",
"9 119.397027 \n",
"10 121.273582 \n",
"11 123.577530 \n",
"12 122.075418 \n",
"13 125.603705 \n",
"14 127.210974 \n",
"15 130.953652 \n",
"16 129.712835 \n",
"17 130.480634 \n",
"18 133.258945 \n",
"19 134.709059 \n",
"20 135.068006 \n",
"21 132.935126 \n",
"22 135.952621 \n",
"23 142.032974 \n",
"24 142.532053 \n",
"25 147.652942 \n",
"26 153.161532 \n",
"27 156.204293 \n",
"\n",
"1 Drive time outside 1hr of Belfast Price\\n(Ref: Q1 2015) \n",
"0 111866.404980 \n",
"1 115613.265107 \n",
"2 120605.219276 \n",
"3 123137.004353 \n",
"4 125098.941485 \n",
"5 125281.221326 \n",
"6 129091.864904 \n",
"7 130830.890712 \n",
"8 131265.759778 \n",
"9 133565.161466 \n",
"10 135664.395891 \n",
"11 138241.740123 \n",
"12 136561.381608 \n",
"13 140508.349303 \n",
"14 142306.342946 \n",
"15 146493.143086 \n",
"16 145105.085124 \n",
"17 145963.994647 \n",
"18 149071.990904 \n",
"19 150694.181735 \n",
"20 151095.722593 \n",
"21 148709.746207 \n",
"22 152085.309777 \n",
"23 158887.182045 \n",
"24 159445.483290 \n",
"25 165174.037638 \n",
"26 171336.300173 \n",
"27 174740.127539 "
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def cleanup_table_7(df):\n",
" \"\"\"\n",
" Table 7: Standardised House Price & Index for Rural Areas of Northern Ireland by drive times\n",
" * Insert Year/Quarter future-headers\n",
" * Clean normally\n",
" # TODO THIS MIGHT BE VALID FOR MULTIINDEXING ON DRIVETIME/[Index/Price]\n",
" \"\"\"\n",
" df = df.copy()\n",
" df.iloc[1,0] = 'Year'\n",
" df.iloc[1,1] = 'Quarter'\n",
" df = basic_cleanup(df)\n",
" return df\n",
"\n",
"cleanup_table_7(source_df['Table 7'])"
]
},
{
"cell_type": "code",
"execution_count": 52,
"id": "93436594-0e73-4c3f-ad2d-1532017dbd87",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(15, 17)"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dest_df['Table 7'] = cleanup_table_7(source_df['Table 7'])\n",
"len(dest_df), len([k for k in source_df.keys() if k.startswith('Table') and k not in dest_df])"
]
},
{
"cell_type": "markdown",
"id": "5a53ac65-10d3-47e7-a046-4e94573ddd79",
"metadata": {},
"source": [
"### Table 8: Number of Verified Residential Property Sales of properties in urban and rural areas and properties in rural areas by drive times witihn towns of 10,000 or more and within 1 hour of Belfast\n",
"\n",
"We're now getting into the swing of this!\n",
"\n",
"This one has two similar problems we've already seen; Munged Quarters/Years (this time with no header on that column...), and annual Total rows.\n",
"\n",
"> Vee must deeel with it"
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "b2a92074-6e50-4eb5-a102-26f5d09f5fa9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" 2 | \n",
" Period | \n",
" Year | \n",
" Quarter | \n",
" Urban | \n",
" Rural | \n",
" Drive time within 20mins of town of 10,000 or more | \n",
" Drive time outside 20mins of town of 10,000 or more | \n",
" Drive time within 1hr of Belfast | \n",
" Drive time outside 1hr of Belfast | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2015Q1 | \n",
" 2015 | \n",
" Q1 | \n",
" 3294 | \n",
" 1322 | \n",
" 898 | \n",
" 424 | \n",
" 976 | \n",
" 346 | \n",
"
\n",
" \n",
" 1 | \n",
" 2015Q2 | \n",
" 2015 | \n",
" Q2 | \n",
" 3789 | \n",
" 1500 | \n",
" 1034 | \n",
" 466 | \n",
" 1142 | \n",
" 358 | \n",
"
\n",
" \n",
" 2 | \n",
" 2015Q3 | \n",
" 2015 | \n",
" Q3 | \n",
" 4199 | \n",
" 1640 | \n",
" 1145 | \n",
" 495 | \n",
" 1250 | \n",
" 390 | \n",
"
\n",
" \n",
" 3 | \n",
" 2015Q4 | \n",
" 2015 | \n",
" Q4 | \n",
" 4396 | \n",
" 1780 | \n",
" 1223 | \n",
" 557 | \n",
" 1342 | \n",
" 438 | \n",
"
\n",
" \n",
" 4 | \n",
" 2016Q1 | \n",
" 2016 | \n",
" Q1 | \n",
" 4424 | \n",
" 1731 | \n",
" 1171 | \n",
" 560 | \n",
" 1263 | \n",
" 468 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"2 Period Year Quarter Urban Rural \\\n",
"0 2015Q1 2015 Q1 3294 1322 \n",
"1 2015Q2 2015 Q2 3789 1500 \n",
"2 2015Q3 2015 Q3 4199 1640 \n",
"3 2015Q4 2015 Q4 4396 1780 \n",
"4 2016Q1 2016 Q1 4424 1731 \n",
"\n",
"2 Drive time within 20mins of town of 10,000 or more \\\n",
"0 898 \n",
"1 1034 \n",
"2 1145 \n",
"3 1223 \n",
"4 1171 \n",
"\n",
"2 Drive time outside 20mins of town of 10,000 or more \\\n",
"0 424 \n",
"1 466 \n",
"2 495 \n",
"3 557 \n",
"4 560 \n",
"\n",
"2 Drive time within 1hr of Belfast Drive time outside 1hr of Belfast \n",
"0 976 346 \n",
"1 1142 358 \n",
"2 1250 390 \n",
"3 1342 438 \n",
"4 1263 468 "
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cleanup_table_5a(source_df['Table 8']).head()"
]
},
{
"cell_type": "code",
"execution_count": 54,
"id": "98b85e13-b103-4af2-9354-5df6c7cac6ca",
"metadata": {},
"outputs": [],
"source": [
"cleanup_table_8 = cleanup_table_5a"
]
},
{
"cell_type": "code",
"execution_count": 55,
"id": "9351206d-2966-4d85-a94d-ac69faedd96d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(16, 16)"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dest_df['Table 8'] = cleanup_table_8(source_df['Table 8'])\n",
"len(dest_df), len([k for k in source_df.keys() if k.startswith('Table') and k not in dest_df])"
]
},
{
"cell_type": "markdown",
"id": "43f96cae-cff0-435c-8ed7-ec38afa708f4",
"metadata": {},
"source": [
"### Table 9: NI Average Sales Prices Q1 2005 - Q4 2021\n",
"\n",
"Wee buns"
]
},
{
"cell_type": "code",
"execution_count": 56,
"id": "0e875abd-d1ef-49df-a824-f02d1fd9ba3d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" 1 | \n",
" Period | \n",
" Year | \n",
" Quarter | \n",
" Simple Mean | \n",
" Simple Median | \n",
" Standardised Price (HPI) | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2005Q1 | \n",
" 2005 | \n",
" Q1 | \n",
" 115912.942222 | \n",
" 100000 | \n",
" 111920.268199 | \n",
"
\n",
" \n",
" 1 | \n",
" 2005Q2 | \n",
" 2005 | \n",
" Q2 | \n",
" 120481.290591 | \n",
" 105000 | \n",
" 116004.031639 | \n",
"
\n",
" \n",
" 2 | \n",
" 2005Q3 | \n",
" 2005 | \n",
" Q3 | \n",
" 128866.225917 | \n",
" 115000 | \n",
" 123386.352673 | \n",
"
\n",
" \n",
" 3 | \n",
" 2005Q4 | \n",
" 2005 | \n",
" Q4 | \n",
" 129649.092074 | \n",
" 117000 | \n",
" 127674.143865 | \n",
"
\n",
" \n",
" 4 | \n",
" 2006Q1 | \n",
" 2006 | \n",
" Q1 | \n",
" 132972.115070 | \n",
" 120000 | \n",
" 131302.064422 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 63 | \n",
" 2020Q4 | \n",
" 2020 | \n",
" Q4 | \n",
" 171803.199843 | \n",
" 150000 | \n",
" 147474.561707 | \n",
"
\n",
" \n",
" 64 | \n",
" 2021Q1 | \n",
" 2021 | \n",
" Q1 | \n",
" 176218.214924 | \n",
" 150000 | \n",
" 149084.306040 | \n",
"
\n",
" \n",
" 65 | \n",
" 2021Q2 | \n",
" 2021 | \n",
" Q2 | \n",
" 184144.458946 | \n",
" 154950 | \n",
" 154323.134643 | \n",
"
\n",
" \n",
" 66 | \n",
" 2021Q3 | \n",
" 2021 | \n",
" Q3 | \n",
" 173490.230508 | \n",
" 155000 | \n",
" 159028.118093 | \n",
"
\n",
" \n",
" 67 | \n",
" 2021Q4 | \n",
" 2021 | \n",
" Q4 | \n",
" 159965.154863 | \n",
" 141000 | \n",
" 159150.737832 | \n",
"
\n",
" \n",
"
\n",
"
68 rows × 6 columns
\n",
"
"
],
"text/plain": [
"1 Period Year Quarter Simple Mean Simple Median \\\n",
"0 2005Q1 2005 Q1 115912.942222 100000 \n",
"1 2005Q2 2005 Q2 120481.290591 105000 \n",
"2 2005Q3 2005 Q3 128866.225917 115000 \n",
"3 2005Q4 2005 Q4 129649.092074 117000 \n",
"4 2006Q1 2006 Q1 132972.115070 120000 \n",
".. ... ... ... ... ... \n",
"63 2020Q4 2020 Q4 171803.199843 150000 \n",
"64 2021Q1 2021 Q1 176218.214924 150000 \n",
"65 2021Q2 2021 Q2 184144.458946 154950 \n",
"66 2021Q3 2021 Q3 173490.230508 155000 \n",
"67 2021Q4 2021 Q4 159965.154863 141000 \n",
"\n",
"1 Standardised Price (HPI) \n",
"0 111920.268199 \n",
"1 116004.031639 \n",
"2 123386.352673 \n",
"3 127674.143865 \n",
"4 131302.064422 \n",
".. ... \n",
"63 147474.561707 \n",
"64 149084.306040 \n",
"65 154323.134643 \n",
"66 159028.118093 \n",
"67 159150.737832 \n",
"\n",
"[68 rows x 6 columns]"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"basic_cleanup(source_df['Table 9'])"
]
},
{
"cell_type": "code",
"execution_count": 57,
"id": "5e7ae839-93a5-4ea7-91e2-7377943d811c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(17, 15)"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dest_df['Table 9'] = basic_cleanup(source_df['Table 9'])\n",
"len(dest_df), len([k for k in source_df.keys() if k.startswith('Table') and k not in dest_df])"
]
},
{
"cell_type": "markdown",
"id": "32baa649-8fb5-44d2-9b08-019698c560aa",
"metadata": {},
"source": [
"### Table 9x: NI Average Sale Prices XXXXX Property Q1 2005 - Q4 2021\n",
"\n",
"These are very similar to Tables 2x; i.e. they're broken down by property type.\n",
"\n",
"Annoyingly, they don't follow the same structure as Tables 2x or Table 9 because they don't include the Year/Quarter headers. \n",
"\n",
"If that reminds you of anything, it's because Table 7 was the same...\n"
]
},
{
"cell_type": "code",
"execution_count": 58,
"id": "36e86da2-c138-412a-943b-907f335cbed1",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" 1 | \n",
" Period | \n",
" Year | \n",
" Quarter | \n",
" Simple Mean | \n",
" Simple Median | \n",
" Standardised Price (HPI) | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2005Q1 | \n",
" 2005 | \n",
" Q1 | \n",
" 166314.816092 | \n",
" 149972.5 | \n",
" 160428.832662 | \n",
"
\n",
" \n",
" 1 | \n",
" 2005Q2 | \n",
" 2005 | \n",
" Q2 | \n",
" 173370.669076 | \n",
" 155000.0 | \n",
" 169686.542965 | \n",
"
\n",
" \n",
" 2 | \n",
" 2005Q3 | \n",
" 2005 | \n",
" Q3 | \n",
" 185397.896739 | \n",
" 165000.0 | \n",
" 180696.666810 | \n",
"
\n",
" \n",
" 3 | \n",
" 2005Q4 | \n",
" 2005 | \n",
" Q4 | \n",
" 186545.119355 | \n",
" 165000.0 | \n",
" 185323.883533 | \n",
"
\n",
" \n",
" 4 | \n",
" 2006Q1 | \n",
" 2006 | \n",
" Q1 | \n",
" 191328.398119 | \n",
" 173000.0 | \n",
" 188669.361197 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 63 | \n",
" 2020Q4 | \n",
" 2020 | \n",
" Q4 | \n",
" 243712.512641 | \n",
" 220000.0 | \n",
" 220592.113069 | \n",
"
\n",
" \n",
" 64 | \n",
" 2021Q1 | \n",
" 2021 | \n",
" Q1 | \n",
" 254182.439174 | \n",
" 225000.0 | \n",
" 224872.989982 | \n",
"
\n",
" \n",
" 65 | \n",
" 2021Q2 | \n",
" 2021 | \n",
" Q2 | \n",
" 268755.621299 | \n",
" 235000.0 | \n",
" 234734.715703 | \n",
"
\n",
" \n",
" 66 | \n",
" 2021Q3 | \n",
" 2021 | \n",
" Q3 | \n",
" 245860.399289 | \n",
" 225000.0 | \n",
" 239101.239764 | \n",
"
\n",
" \n",
" 67 | \n",
" 2021Q4 | \n",
" 2021 | \n",
" Q4 | \n",
" 244468.040738 | \n",
" 219000.0 | \n",
" 241131.373512 | \n",
"
\n",
" \n",
"
\n",
"
68 rows × 6 columns
\n",
"
"
],
"text/plain": [
"1 Period Year Quarter Simple Mean Simple Median \\\n",
"0 2005Q1 2005 Q1 166314.816092 149972.5 \n",
"1 2005Q2 2005 Q2 173370.669076 155000.0 \n",
"2 2005Q3 2005 Q3 185397.896739 165000.0 \n",
"3 2005Q4 2005 Q4 186545.119355 165000.0 \n",
"4 2006Q1 2006 Q1 191328.398119 173000.0 \n",
".. ... ... ... ... ... \n",
"63 2020Q4 2020 Q4 243712.512641 220000.0 \n",
"64 2021Q1 2021 Q1 254182.439174 225000.0 \n",
"65 2021Q2 2021 Q2 268755.621299 235000.0 \n",
"66 2021Q3 2021 Q3 245860.399289 225000.0 \n",
"67 2021Q4 2021 Q4 244468.040738 219000.0 \n",
"\n",
"1 Standardised Price (HPI) \n",
"0 160428.832662 \n",
"1 169686.542965 \n",
"2 180696.666810 \n",
"3 185323.883533 \n",
"4 188669.361197 \n",
".. ... \n",
"63 220592.113069 \n",
"64 224872.989982 \n",
"65 234734.715703 \n",
"66 239101.239764 \n",
"67 241131.373512 \n",
"\n",
"[68 rows x 6 columns]"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cleanup_table_7(source_df['Table 9a'])"
]
},
{
"cell_type": "code",
"execution_count": 59,
"id": "2cff0801-bbe9-401c-936f-eb3a67413827",
"metadata": {},
"outputs": [],
"source": [
"cleanup_table_9x = cleanup_table_7"
]
},
{
"cell_type": "code",
"execution_count": 60,
"id": "8f3c5110-efd4-40d8-9538-b35301ee963a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(21, 11)"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table9s = re.compile('Table 9[a-z]')\n",
"for table in source_df:\n",
" if table9s.match(table):\n",
" dest_df[table] = cleanup_table_9x(source_df[table])\n",
"len(dest_df), len([k for k in source_df.keys() if k.startswith('Table') and k not in dest_df])"
]
},
{
"cell_type": "markdown",
"id": "0a20b4b1-8bef-4317-bec5-3ef92142cf3e",
"metadata": {},
"source": [
"### Table 10x: Number of Verified Residential Property Sales by Type in XXXXX\n",
"Surprisingly, we're in the home straight; the remaining tables are all of the same structure, with familiar awkwardness.,,\n",
"\n",
"* Annual-Total Rows\n",
"* Munged Year/Quarter Column\n",
"* That column having a silly (but contextual) name\n",
"* a different offset\n",
"\n",
"Fortunately, we already have something like that from dealing with Table 5a!\n"
]
},
{
"cell_type": "code",
"execution_count": 61,
"id": "20ee7c07-a161-4f33-8698-3c43a48dea94",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Table 10a: Number of Verified Residential Property Sales by Type in Antrim and Newtownabbey Council | \n",
" Unnamed: 1 | \n",
" Unnamed: 2 | \n",
" Unnamed: 3 | \n",
" Unnamed: 4 | \n",
" Unnamed: 5 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Please note figures for the 2 most recent quar... | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" Back to contents | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" ANTRIM AND NEWTOWNABBEY | \n",
" Apartments | \n",
" Detached | \n",
" Semi-Detached | \n",
" Terrace | \n",
" Total | \n",
"
\n",
" \n",
" 3 | \n",
" Q1 2005 | \n",
" 10 | \n",
" 61 | \n",
" 78 | \n",
" 87 | \n",
" 236 | \n",
"
\n",
" \n",
" 4 | \n",
" Q2 2005 | \n",
" 46 | \n",
" 213 | \n",
" 216 | \n",
" 260 | \n",
" 735 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 83 | \n",
" Q1 2021 | \n",
" 52 | \n",
" 222 | \n",
" 212 | \n",
" 166 | \n",
" 652 | \n",
"
\n",
" \n",
" 84 | \n",
" Q2 2021 | \n",
" 56 | \n",
" 217 | \n",
" 275 | \n",
" 176 | \n",
" 724 | \n",
"
\n",
" \n",
" 85 | \n",
" Q3 2021 | \n",
" 47 | \n",
" 222 | \n",
" 268 | \n",
" 202 | \n",
" 739 | \n",
"
\n",
" \n",
" 86 | \n",
" Q4 2021 | \n",
" 50 | \n",
" 117 | \n",
" 176 | \n",
" 189 | \n",
" 532 | \n",
"
\n",
" \n",
" 87 | \n",
" 2021 Total | \n",
" 205 | \n",
" 778 | \n",
" 931 | \n",
" 733 | \n",
" 2647 | \n",
"
\n",
" \n",
"
\n",
"
88 rows × 6 columns
\n",
"
"
],
"text/plain": [
" Table 10a: Number of Verified Residential Property Sales by Type in Antrim and Newtownabbey Council \\\n",
"0 Please note figures for the 2 most recent quar... \n",
"1 Back to contents \n",
"2 ANTRIM AND NEWTOWNABBEY \n",
"3 Q1 2005 \n",
"4 Q2 2005 \n",
".. ... \n",
"83 Q1 2021 \n",
"84 Q2 2021 \n",
"85 Q3 2021 \n",
"86 Q4 2021 \n",
"87 2021 Total \n",
"\n",
" Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 \n",
"0 NaN NaN NaN NaN NaN \n",
"1 NaN NaN NaN NaN NaN \n",
"2 Apartments Detached Semi-Detached Terrace Total \n",
"3 10 61 78 87 236 \n",
"4 46 213 216 260 735 \n",
".. ... ... ... ... ... \n",
"83 52 222 212 166 652 \n",
"84 56 217 275 176 724 \n",
"85 47 222 268 202 739 \n",
"86 50 117 176 189 532 \n",
"87 205 778 931 733 2647 \n",
"\n",
"[88 rows x 6 columns]"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"source_df['Table 10a']"
]
},
{
"cell_type": "code",
"execution_count": 62,
"id": "562913c9-f276-4911-a7d8-394e9c1edd6a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" 2 | \n",
" Period | \n",
" Year | \n",
" Quarter | \n",
" Apartments | \n",
" Detached | \n",
" Semi-Detached | \n",
" Terrace | \n",
" Total | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2005Q1 | \n",
" 2005 | \n",
" Q1 | \n",
" 10 | \n",
" 61 | \n",
" 78 | \n",
" 87 | \n",
" 236 | \n",
"
\n",
" \n",
" 1 | \n",
" 2005Q2 | \n",
" 2005 | \n",
" Q2 | \n",
" 46 | \n",
" 213 | \n",
" 216 | \n",
" 260 | \n",
" 735 | \n",
"
\n",
" \n",
" 2 | \n",
" 2005Q3 | \n",
" 2005 | \n",
" Q3 | \n",
" 46 | \n",
" 214 | \n",
" 238 | \n",
" 259 | \n",
" 757 | \n",
"
\n",
" \n",
" 3 | \n",
" 2005Q4 | \n",
" 2005 | \n",
" Q4 | \n",
" 65 | \n",
" 227 | \n",
" 270 | \n",
" 331 | \n",
" 893 | \n",
"
\n",
" \n",
" 4 | \n",
" 2006Q1 | \n",
" 2006 | \n",
" Q1 | \n",
" 48 | \n",
" 186 | \n",
" 231 | \n",
" 296 | \n",
" 761 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 63 | \n",
" 2020Q4 | \n",
" 2020 | \n",
" Q4 | \n",
" 53 | \n",
" 248 | \n",
" 268 | \n",
" 187 | \n",
" 756 | \n",
"
\n",
" \n",
" 64 | \n",
" 2021Q1 | \n",
" 2021 | \n",
" Q1 | \n",
" 52 | \n",
" 222 | \n",
" 212 | \n",
" 166 | \n",
" 652 | \n",
"
\n",
" \n",
" 65 | \n",
" 2021Q2 | \n",
" 2021 | \n",
" Q2 | \n",
" 56 | \n",
" 217 | \n",
" 275 | \n",
" 176 | \n",
" 724 | \n",
"
\n",
" \n",
" 66 | \n",
" 2021Q3 | \n",
" 2021 | \n",
" Q3 | \n",
" 47 | \n",
" 222 | \n",
" 268 | \n",
" 202 | \n",
" 739 | \n",
"
\n",
" \n",
" 67 | \n",
" 2021Q4 | \n",
" 2021 | \n",
" Q4 | \n",
" 50 | \n",
" 117 | \n",
" 176 | \n",
" 189 | \n",
" 532 | \n",
"
\n",
" \n",
"
\n",
"
68 rows × 8 columns
\n",
"
"
],
"text/plain": [
"2 Period Year Quarter Apartments Detached Semi-Detached Terrace Total\n",
"0 2005Q1 2005 Q1 10 61 78 87 236\n",
"1 2005Q2 2005 Q2 46 213 216 260 735\n",
"2 2005Q3 2005 Q3 46 214 238 259 757\n",
"3 2005Q4 2005 Q4 65 227 270 331 893\n",
"4 2006Q1 2006 Q1 48 186 231 296 761\n",
".. ... ... ... ... ... ... ... ...\n",
"63 2020Q4 2020 Q4 53 248 268 187 756\n",
"64 2021Q1 2021 Q1 52 222 212 166 652\n",
"65 2021Q2 2021 Q2 56 217 275 176 724\n",
"66 2021Q3 2021 Q3 47 222 268 202 739\n",
"67 2021Q4 2021 Q4 50 117 176 189 532\n",
"\n",
"[68 rows x 8 columns]"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cleanup_table_5a(source_df['Table 10a'])"
]
},
{
"cell_type": "code",
"execution_count": 63,
"id": "22a6d8d2-dbfb-49f1-9a71-3e5853c05835",
"metadata": {},
"outputs": [],
"source": [
"cleanup_table_10x = cleanup_table_5a"
]
},
{
"cell_type": "code",
"execution_count": 64,
"id": "fbffb938-0270-42ec-9aa6-4a69ad90c5b9",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(32, 0)"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table10s = re.compile('Table 10[a-z]')\n",
"for table in source_df:\n",
" if table10s.match(table):\n",
" dest_df[table] = cleanup_table_10x(source_df[table])\n",
"len(dest_df), len([k for k in source_df.keys() if k.startswith('Table') and k not in dest_df])"
]
},
{
"cell_type": "markdown",
"id": "9bbebb1a-f233-4685-9dd0-e1d586b9b0ef",
"metadata": {},
"source": [
"## And We're Done!\n",
"\n",
"So, we can see that while government open data is a pain, at least it's a ... consistently inconsistent pain?\n",
"\n",
"I hope this was helpful to someone else. \n"
]
},
{
"cell_type": "code",
"execution_count": 65,
"id": "dc8f0d66-b037-442f-ad5a-c28cde54c2ac",
"metadata": {},
"outputs": [],
"source": [
"dest_df['Contents'] = source_df['Contents'][source_df['Contents']['Worksheet Name'].str.startswith('Table')]"
]
},
{
"cell_type": "code",
"execution_count": 66,
"id": "0822ca15-9f12-40c1-a7d9-53d0ed22717c",
"metadata": {},
"outputs": [],
"source": [
"with pd.ExcelWriter('NI Housing Price Index.xlsx') as writer:\n",
" # Thankfully these are semantically sortable otherwise this would be a _massive_ pain\n",
" for k,df in sorted(dest_df.items()):\n",
" df.to_excel(writer, sheet_name=k)\n",
" "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}