{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "First, we need to load the necessary libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:14:37.769341Z", "iopub.status.busy": "2026-02-23T20:14:37.769116Z", "iopub.status.idle": "2026-02-23T20:14:41.013344Z", "shell.execute_reply": "2026-02-23T20:14:41.011779Z" } }, "outputs": [], "source": [ "import pandas as pd # Used for data manipulation\n", "import numpy as np # Used for numerical operations\n", "import matplotlib.pyplot as plt # Used for plotting\n", "import spacy # Used for text preprocessing and NLP tasks\n", "from spacy import displacy # Used for visualizing NER results\n", "from wordcloud import WordCloud # Used for creating word clouds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also need to download the spaCy English model for NER. This only needs to be done once" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:14:41.016923Z", "iopub.status.busy": "2026-02-23T20:14:41.016537Z", "iopub.status.idle": "2026-02-23T20:14:41.020604Z", "shell.execute_reply": "2026-02-23T20:14:41.019368Z" } }, "outputs": [], "source": [ "#!python -m spacy download en_core_web_sm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we download a dataset of central bank speeches. We are using the dataset from [cbspeeches.com](https://cbspeeches.com/) [@Campiglio2025], which contains a collection of speeches by central bankers from around the world." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:14:41.023126Z", "iopub.status.busy": "2026-02-23T20:14:41.022912Z", "iopub.status.idle": "2026-02-23T20:14:41.027679Z", "shell.execute_reply": "2026-02-23T20:14:41.026805Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset already downloaded!\n" ] } ], "source": [ "import urllib.request\n", "import os.path\n", "\n", "# Create the data folder if it doesn't exist\n", "os.makedirs(\"data\", exist_ok=True)\n", "\n", "# Check if the file exists\n", "if not os.path.isfile(\"data/CBS_dataset_v1.0.dta\"):\n", "\n", " print(\"Downloading dataset...\")\n", "\n", " # Define the dataset to be downloaded\n", " fileurl = \"https://www.dropbox.com/scl/fi/la5hpz39yht8mmoz0n98t/CBS_dataset_v1.0.dta?rlkey=jo0u8ktm1ixkwic4jw03re9c6&dl=1\"\n", "\n", " # Define the filename to save the dataset\n", " filename = \"data/CBS_dataset_v1.0.dta\"\n", "\n", " # Download the dataset in the data folder\n", " urllib.request.urlretrieve(fileurl, filename)\n", "\n", " print(\"DONE!\")\n", "\n", "else:\n", "\n", " print(\"Dataset already downloaded!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we load the dataset into a pandas DataFrame" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:14:41.060339Z", "iopub.status.busy": "2026-02-23T20:14:41.060151Z", "iopub.status.idle": "2026-02-23T20:14:42.343840Z", "shell.execute_reply": "2026-02-23T20:14:42.342410Z" } }, "outputs": [], "source": [ "speeches = pd.read_stata(\"data/CBS_dataset_v1.0.dta\")\n", "speeches = speeches.set_index(\"index\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We first filter the dataset to only include ECB speeches" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:14:42.346672Z", "iopub.status.busy": "2026-02-23T20:14:42.346504Z", "iopub.status.idle": "2026-02-23T20:14:42.366330Z", "shell.execute_reply": "2026-02-23T20:14:42.365471Z" } }, "outputs": [ { "data": { "text/html": [ "
| \n", " | URL | \n", "Title | \n", "Subtitle | \n", "Date | \n", "Authorname | \n", "Role | \n", "Gender | \n", "CentralBank | \n", "Country | \n", "text | \n", "text_original | \n", "Filename | \n", "Language | \n", "Source | \n", "|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| index | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
| 2976 | \n", "https://www.ecb.europa.eu/press/key/date/2000/... | \n", "\n", " | One year with the euro | \n", "Speech delivered by Dr Sirkka Hmlinen, Mem... | \n", "2000-01-08 | \n", "Sirkka Hmlinen | \n", "Board member | \n", "Female | \n", "European Central Bank | \n", "ECB | \n", "One year with the euro Speech delivered by Dr ... | \n", "\n", " | ecb_000108.en | \n", "English | \n", "CB websites | \n", "
| 2977 | \n", "https://www.ecb.europa.eu/press/key/date/2000/... | \n", "\n", " | Opening Remarks at a Hearing of the Committee ... | \n", "Professor Otmar Issing, Member of the Board o... | \n", "2000-01-10 | \n", "Otmar Issing | \n", "Board member | \n", "Male | \n", "European Central Bank | \n", "ECB | \n", "Opening Remarks at a Hearing of the Committee ... | \n", "\n", " | ecb_000110.en | \n", "English | \n", "CB websites | \n", "
| 2978 | \n", "https://www.ecb.europa.eu/press/key/date/2000/... | \n", "\n", " | The international impact of the euro | \n", "Speech delivered by Christian Noyer, Vice... | \n", "2000-01-13 | \n", "Christian Noyer | \n", "Deputy Governor | \n", "Male | \n", "European Central Bank | \n", "ECB | \n", "The international impact of the euro Speech de... | \n", "\n", " | ecb_000113.en | \n", "English | \n", "BIS | \n", "
| 2979 | \n", "https://www.ecb.europa.eu/press/key/date/2000/... | \n", "\n", " | The role of the central bank in encouraging an... | \n", "Speech given by Christian Noyer, Vice-Pres... | \n", "2000-01-21 | \n", "Christian Noyer | \n", "Deputy Governor | \n", "Male | \n", "European Central Bank | \n", "ECB | \n", "The role of the central bank in encouraging an... | \n", "\n", " | ecb_000121.en | \n", "English | \n", "BIS | \n", "
| 2980 | \n", "https://www.ecb.europa.eu/press/key/date/2000/... | \n", "\n", " | The euro area - first experience and perspectives | \n", "by Professor Otmar Issing, Member of the Boar... | \n", "2000-01-26 | \n", "Otmar Issing | \n", "Board member | \n", "Male | \n", "European Central Bank | \n", "ECB | \n", "The euro area - first experience and perspecti... | \n", "\n", " | ecb_000126.en | \n", "English | \n", "CB websites | \n", "