{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Let's import the necessary libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:15:46.206217Z", "iopub.status.busy": "2026-02-23T20:15:46.206008Z", "iopub.status.idle": "2026-02-23T20:15:53.767744Z", "shell.execute_reply": "2026-02-23T20:15:53.766985Z" } }, "outputs": [], "source": [ "import pandas as pd # Used for data manipulation\n", "import matplotlib.pyplot as plt # Used for plotting\n", "import seaborn as sns # Used for plotting\n", "from huggingface_hub import login # Used to log in to Hugging Face and access datasets\n", "from sentence_transformers import SentenceTransformer # Used for encoding sentences into vector representations\n", "from transformers import pipeline # Used for using pre-trained models from Hugging Face for sentiment analysis\n", "from sklearn.model_selection import train_test_split # Used for splitting the dataset into training and testing sets\n", "from sklearn.ensemble import RandomForestClassifier # Used for training a Random Forest classifier\n", "from sklearn.metrics import confusion_matrix, accuracy_score, roc_auc_score, recall_score, precision_score # Used for evaluating the performance of the model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will again use a pre-labeled dataset for sentence-level sentiment analysis of ECB speeches [@Pfeifer2023], which is available on Hugging Face ([Central Bank Communication Dataset](https://huggingface.co/datasets/Moritz-Pfeifer/CentralBankCommunication)). The dataset contains sentences from ECB speeches that have been labeled as positive or negative in terms of sentiment.\n", "\n", "Let's load the dataset into a pandas DataFrame" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:15:53.770696Z", "iopub.status.busy": "2026-02-23T20:15:53.770371Z", "iopub.status.idle": "2026-02-23T20:15:54.671819Z", "shell.execute_reply": "2026-02-23T20:15:54.670875Z" } }, "outputs": [], "source": [ "df = pd.read_csv(\"hf://datasets/Moritz-Pfeifer/CentralBankCommunication/Sentiment/ECB_prelabelled_sent.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since we have explored the dataset in the NLP chapter, we will not go into detail about its structure here. Instead, we will focus on how to encode the sentences into vector representations using a pre-trained sentence transformer model.\n", "\n", "\n", "### Encoding Sentences with a Pre-trained Sentence Transformer\n", "\n", "Suppose we want to get a dense vector representation of sentences that we can use for various downstream tasks, e.g., as input features for a machine learning model. We can use a pre-trained sentence transformer model from the Hugging Face library to achieve this.\n", "\n", "First, we need to load the pre-trained model. We will use the \"all-MiniLM-L6-v2\" model, which is a small embedding model that does not require a GPU and can be run on a CPU. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:15:54.674321Z", "iopub.status.busy": "2026-02-23T20:15:54.674124Z", "iopub.status.idle": "2026-02-23T20:15:58.655280Z", "shell.execute_reply": "2026-02-23T20:15:58.653957Z" } }, "outputs": [], "source": [ "model = SentenceTransformer(\"all-MiniLM-L6-v2\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consider the following example sentences" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:15:58.657910Z", "iopub.status.busy": "2026-02-23T20:15:58.657530Z", "iopub.status.idle": "2026-02-23T20:15:58.661854Z", "shell.execute_reply": "2026-02-23T20:15:58.660504Z" } }, "outputs": [], "source": [ "sentences = [\n", " \"The ECB's monetary policy is not very effective for stabilizing the economy.\",\n", " \"The ECB's monetary policy is very ineffective for stabilizing the economy.\"\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can encode these sentences into vector representations using the `encode` method of the model. This will give us a dense vector for each sentence." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:15:58.664238Z", "iopub.status.busy": "2026-02-23T20:15:58.664074Z", "iopub.status.idle": "2026-02-23T20:15:59.703250Z", "shell.execute_reply": "2026-02-23T20:15:59.702057Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2, 384)\n" ] } ], "source": [ "embeddings = model.encode(sentences)\n", "print(embeddings.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output will show the shape of the embeddings, which should be (2, 384) since we have 2 sentences and the \"all-MiniLM-L6-v2\" model produces 384-dimensional embeddings. We can also compute the cosine similarity between the embeddings of the two sentences to see how similar they are in terms of their vector representations." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:15:59.736588Z", "iopub.status.busy": "2026-02-23T20:15:59.736355Z", "iopub.status.idle": "2026-02-23T20:15:59.743268Z", "shell.execute_reply": "2026-02-23T20:15:59.742483Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[1.0000, 0.9408],\n", " [0.9408, 1.0000]])\n" ] } ], "source": [ "similarities = model.similarity(embeddings, embeddings)\n", "print(similarities)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now apply this encoding to the sentences in our dataset. We will encode all sentences in the \"text\" column of our DataFrame and store the resulting embeddings in a new column called \"embedding\"." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:15:59.745392Z", "iopub.status.busy": "2026-02-23T20:15:59.745204Z", "iopub.status.idle": "2026-02-23T20:16:04.204374Z", "shell.execute_reply": "2026-02-23T20:16:04.203428Z" } }, "outputs": [], "source": [ "df[\"embedding\"] = list(model.encode(df[\"text\"].tolist()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that we did not do any preprocessing of the text before encoding, as the sentence transformer model can handle raw text input, and does not require tokenization or other preprocessing steps. The model will take care of that internally when encoding the sentences.\n", "\n", "\n", "### Using Sentence Embeddings for Sentiment Analysis\n", "\n", "Let's use the sentence embeddings as input features for a machine learning model to perform sentiment analysis. We will use a simple Random Forest classifier for this task. First, we need to split the dataset into a training set and a testing set" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:16:04.206730Z", "iopub.status.busy": "2026-02-23T20:16:04.206464Z", "iopub.status.idle": "2026-02-23T20:16:04.213779Z", "shell.execute_reply": "2026-02-23T20:16:04.212976Z" } }, "outputs": [], "source": [ "X = df['embedding'].to_list() # Convert the embeddings from a pandas Series to a list of numpy arrays\n", "y = df['sentiment']\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y) # We use 20% of the data for testing, set a random state for reproducibility, and stratify to maintain class balance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we can train a Random Forest classifier on the training data" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:16:04.215919Z", "iopub.status.busy": "2026-02-23T20:16:04.215730Z", "iopub.status.idle": "2026-02-23T20:16:06.089509Z", "shell.execute_reply": "2026-02-23T20:16:06.088701Z" } }, "outputs": [], "source": [ "clf_rf = RandomForestClassifier(n_estimators=100, random_state = 42).fit(X_train, y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To evaluate the performance of the model, we can make predictions on the testing set and calculate metrics such as accuracy, precision, and recall" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:16:06.091565Z", "iopub.status.busy": "2026-02-23T20:16:06.091386Z", "iopub.status.idle": "2026-02-23T20:16:06.130947Z", "shell.execute_reply": "2026-02-23T20:16:06.129503Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 0.9512670565302144\n", "Precision: 0.9560439560439561\n", "Recall: 0.9109947643979057\n", "ROC AUC: 0.9940977529186044\n" ] } ], "source": [ "y_pred_rf = clf_rf.predict(X_test)\n", "y_proba_rf = clf_rf.predict_proba(X_test)\n", "\n", "print(f\"Accuracy: {accuracy_score(y_test, y_pred_rf)}\")\n", "print(f\"Precision: {precision_score(y_test, y_pred_rf)}\")\n", "print(f\"Recall: {recall_score(y_test, y_pred_rf)}\")\n", "print(f\"ROC AUC: {roc_auc_score(y_test, y_proba_rf[:, 1])}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our machine learning model does perform much better than using the classical TF-IDF features, which we used in the NLP chapter. This shows that the sentence embeddings from the pre-trained model capture more meaningful information about the sentences, which allows the Random Forest classifier to make better predictions about the sentiment of the sentences.\n", "\n", "We can also look at the confusion matrix to see how well the model is performing in terms of true positives, true negatives, false positives, and false negatives" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:16:06.132953Z", "iopub.status.busy": "2026-02-23T20:16:06.132797Z", "iopub.status.idle": "2026-02-23T20:16:06.260244Z", "shell.execute_reply": "2026-02-23T20:16:06.258995Z" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhsAAAGwCAYAAAAAFKcNAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAQTRJREFUeJzt3Xd4VGX6//HPpE0KSSSkYwihKV0FpEqTIojKotKCgiJI30iJG11XECXCKkVYcZdF2sKCLsUCuKICGlHpUkWUgCDJBiEEQkJIwvn9wc/5OgY0ZU5mknm/9jrXlXnOM2fuyW7c2/tpFsMwDAEAAJjEw9kBAACAyo1kAwAAmIpkAwAAmIpkAwAAmIpkAwAAmIpkAwAAmIpkAwAAmIpkAwAAmMrL2QGY4dEV+5wdAuCS/tG3ibNDAFyObzn8P6Hf7WMc8pzcPfMc8pzyRmUDAACYqlJWNgAAcCkW9/53e5INAADMZrE4OwKnItkAAMBsbl7ZcO9vDwAATEdlAwAAszGMAgAATMUwCgAAgHmobAAAYDaGUQAAgKkYRgEAADAPlQ0AAMzm5sMoVDYAADCbxcMxVwnMnz9fTZo0UVBQkIKCgtS6dWtt3LjRdt8wDE2ePFnR0dHy8/NTx44ddfDgQbtn5OXlaezYsQoNDVVAQIDuv/9+nTp1qsRfn2QDAIBK6Oabb9bLL7+snTt3aufOnercubMeeOABW0IxY8YMzZw5U/PmzdOOHTsUGRmprl276uLFi7ZnJCQkaO3atVq5cqVSUlKUnZ2tXr16qbCwsESxWAzDMBz67VwAR8wD18cR80BR5XLEfNtnHfKc3M9fKtP7Q0JC9Ne//lWPP/64oqOjlZCQoKefflrStSpGRESEpk+frieffFJZWVkKCwvTsmXL1K9fP0nS6dOnFRMTow0bNqh79+7F/lwqGwAAmM1Bwyh5eXm6cOGC3ZWXl/e7H19YWKiVK1fq0qVLat26tVJTU5Wenq5u3brZ+litVnXo0EHbtm2TJO3atUv5+fl2faKjo9WoUSNbn+Ii2QAAwGwWi0Ou5ORkBQcH213Jyck3/Nj9+/erSpUqslqtGjFihNauXasGDRooPT1dkhQREWHXPyIiwnYvPT1dPj4+qlq16g37FBerUQAAqCCSkpI0fvx4uzar1XrD/rfccov27t2r8+fPa/Xq1Ro8eLC2bt1qu2/51SoZwzCKtP1acfr8GskGAABmc9CmXlar9TeTi1/z8fFRnTp1JEnNmzfXjh07NGfOHNs8jfT0dEVFRdn6Z2Rk2KodkZGRunLlijIzM+2qGxkZGWrTpk2J4mYYBQAAszlh6ev1GIahvLw8xcXFKTIyUps2bbLdu3LlirZu3WpLJJo1ayZvb2+7PmlpaTpw4ECJkw0qGwAAVELPPPOMevTooZiYGF28eFErV67Uli1b9MEHH8hisSghIUHTpk1T3bp1VbduXU2bNk3+/v4aOHCgJCk4OFhDhw7VhAkTVK1aNYWEhGjixIlq3LixunTpUqJYSDYAADCbR/nvIPq///1PjzzyiNLS0hQcHKwmTZrogw8+UNeuXSVJiYmJys3N1ahRo5SZmamWLVvqww8/VGBgoO0Zs2bNkpeXl/r27avc3FzdfffdWrx4sTw9PUsUC/tsAG6EfTaAospln43OZdsf42e5nzhmv47yxpwNAABgKoZRAAAwm5sfxEayAQCA2Ry09LWicu9vDwAATEdlAwAAszGMAgAATOXmwygkGwAAmM3NKxvunWoBAADTUdkAAMBsDKMAAABTMYwCAABgHiobAACYjWEUAABgKoZRAAAAzENlAwAAszGMAgAATOXmyYZ7f3sAAGA6KhsAAJjNzSeIkmwAAGA2Nx9GIdkAAMBsbl7ZcO9UCwAAmI7KBgAAZmMYBQAAmIphFAAAAPNQ2QAAwGQWN69skGwAAGAyd082GEYBAACmorIBAIDZ3LuwQbIBAIDZGEYBAAAwEZUNAABM5u6VDZINAABMRrIBAABM5e7JBnM2AACAqahsAABgNvcubJBsAABgNoZRAAAATERlAwAAk7l7ZYNkAwAAk7l7ssEwCgAAMBWVDQAATObulQ2SDQAAzObeuQbDKAAAwFxUNgAAMBnDKAAAwFQkGwAAwFTunmy4zJyNZcuWqW3btoqOjtaJEyckSbNnz9Y777zj5MgAAEBZuESyMX/+fI0fP149e/bU+fPnVVhYKEm66aabNHv2bOcGBwBAWVkcdFVQLpFszJ07VwsWLNCzzz4rT09PW3vz5s21f/9+J0YGAEDZWSwWh1wVlUskG6mpqbr99tuLtFutVl26dMkJEQEAAEdxiWQjLi5Oe/fuLdK+ceNGNWjQoPwDAgDAgZxR2UhOTlaLFi0UGBio8PBw9e7dW0eOHLHrM2TIkCKf0apVK7s+eXl5Gjt2rEJDQxUQEKD7779fp06dKlEsLpFsTJo0SaNHj9aqVatkGIa2b9+ul156Sc8884wmTZrk7PAAACgTZyQbW7du1ejRo/Xll19q06ZNKigoULdu3YqMGNxzzz1KS0uzXRs2bLC7n5CQoLVr12rlypVKSUlRdna2evXqZZtfWRwusfT1scceU0FBgRITE5WTk6OBAweqevXqmjNnjvr37+/s8AAAcAl5eXnKy8uza7NarbJarUX6fvDBB3avFy1apPDwcO3atUvt27e3e39kZOR1Py8rK0sLFy7UsmXL1KVLF0nSv/71L8XExOijjz5S9+7dixW3S1Q2JGnYsGE6ceKEMjIylJ6erpMnT2ro0KHODgsAgDJzVGUjOTlZwcHBdldycnKxYsjKypIkhYSE2LVv2bJF4eHhqlevnoYNG6aMjAzbvV27dik/P1/dunWztUVHR6tRo0batm1bsb+/SyQbU6ZM0ffffy9JCg0NVXh4uJMjAgDAgRy09DUpKUlZWVl2V1JS0u9+vGEYGj9+vNq1a6dGjRrZ2nv06KHly5frk08+0auvvqodO3aoc+fOtupJenq6fHx8VLVqVbvnRUREKD09vdhf3yWGUVavXq0XXnhBLVq00KBBg9SvXz+FhYU5OywAAFzKjYZMfs+YMWO0b98+paSk2LX369fP9nOjRo3UvHlzxcbGav369erTp88Nn2cYRonmkLhEZWPfvn3at2+fOnfurJkzZ6p69erq2bOnVqxYoZycHGeHBwBAmThzn42xY8fq3Xff1ebNm3XzzTf/Zt+oqCjFxsbq6NGjkqTIyEhduXJFmZmZdv0yMjIUERFR7BhcItmQpIYNG2ratGk6duyYNm/erLi4OCUkJNxw0goAABWFM5INwzA0ZswYrVmzRp988oni4uJ+9z1nz57VyZMnFRUVJUlq1qyZvL29tWnTJluftLQ0HThwQG3atCl2LC4xjPJrAQEB8vPzk4+Pjy5evOjscAAAKBNn7P45evRorVixQu+8844CAwNtcyyCg4Pl5+en7OxsTZ48WQ8++KCioqJ0/PhxPfPMMwoNDdUf/vAHW9+hQ4dqwoQJqlatmkJCQjRx4kQ1btzYtjqlOFymspGamqqXXnpJDRo0UPPmzbV7925Nnjy5RBNQAADANfPnz1dWVpY6duyoqKgo27Vq1SpJkqenp/bv368HHnhA9erV0+DBg1WvXj198cUXCgwMtD1n1qxZ6t27t/r27au2bdvK399f7733nt3xIr/HYhiG4fBvWEKtW7fW9u3b1bhxY8XHx9v22SitR1fsc2B0QOXxj75NnB0C4HJ8y6HGHzPGMSeYn5z3gEOeU95cYhilU6dO+uc//6mGDRs6OxQAAByuIh+i5ggukWxMmzbN2SEAAACTOC3ZGD9+vKZOnaqAgACNHz/+N/vOnDmznKJCcdwSFqCeDcJUs6qfqvp7a/anx7X71AXb/aUDr1+qX7knTRsOnynSPqFjTTWNDiryHKAyKSgo0Bt/m6v169/T2Z9+UmhYmO5/4A8aPmKUPDxcZvocTEJlw0n27Nmj/Px828+oOKxeHvohM1effX9O49rXLHJ/7JpDdq+bRAdqaMubteOHrCJ9u98SalaYgEtZtHCB3n5rpaZOm67adero0IED+sufkxQYGKj4RwY7OzyYjGTDSTZv3nzdn+H69qVd1L60Gy9JzrpcYPf6jupBOvy/bJ25dMWuPeYmX91za6gm//c7ze3TwJRYAVfx9dd71bHz3WrfoaMkqXr1m7Vxw3odPHjAuYEB5cAlanePP/74dffTuHTpkh5//HEnRARHCfL1UtPqQfr0e/vd53w8LRrVtoaW7TxdJDkBKqPbb2+m7V9+qePHUyVJR775Rnv27NJdd3VwcmQoD87cQdQVuESysWTJEuXm5hZpz83N1dKlS50QERylXVxVXc4v1M6T9kMoA++I1tEzOdr9I3M04B4ef2KY7ul5r3r36qFmTRuq30O9NeiRwepxby9nh4by4KCD2Coqp65GuXDhggzDkGEYunjxonx9fW33CgsLtWHDht89ATYvL892Op3tvflX5OntY0rMKJn2tarqi+PnlX/1/7Zzub16kBpEVtFzG486MTKgfH2wcYPWv/+ukme8qjp16uibbw7rry8nKywsXPf3/oOzwwNM5dRk46abbrKVhurVq1fkvsVi0ZQpU37zGcnJyUX6NOkzQk0fHOnQWFFy9cL8FR3sq799/oNde4OIAIVX8dEbD9nvqzKuXayOnLmk5I+PlWeYQLmY9eoMPT50uHr0vFeSVLfeLUo7fVoL//l3kg03UJGHQBzBqcnG5s2bZRiGOnfurNWrVyskJMR2z8fHR7GxsYqOjv7NZyQlJRVZOjty7bemxIuS6VA7RKlnc3Ty/GW79vcPndGW78/ZtSXfe4uW7z6tPQyroJK6nHtZHh72/4fj6empq1edvokzygHJhhN16HBtYlRqaqpq1KhRqv8yrFarrFarXRtDKOayenkoosr//Y7DAnxU4yZfXbpSqLM515Yz+3p56M4aN2nF7tNF3p91ueC6k0LP5uTrp0v55gUOOFGHjp204B9vKDIqWrXr1NE3hw9r2ZJFeuAPDzo7NJQDN881nJds7Nu3T40aNZKHh4eysrK0f//+G/Zt0oTzHFxJXIifnulS2/Y6vtm16tNnx85pwZenJEmtYm+SJH154nx5hwe4pD89+2f97bU5mjZ1is6dO6uw8HA99HA/PTlytLNDA0zntIPYPDw8lJ6ervDwcHl4eMhiseh6oVgsFhUWFpbo2RzEBlwfB7EBRZXHQWx1J33gkOcc/es9DnlOeXNaZSM1NVVhYWG2nwEAqKwYRnGS2NjY6/4MAAAqF5fZ1Gv9+vW214mJibrpppvUpk0bnThxwomRAQBQduwg6gKmTZsmPz8/SdIXX3yhefPmacaMGQoNDdVTTz3l5OgAACgbi8UxV0Xl1KWvPzt58qTq1KkjSVq3bp0eeughDR8+XG3btlXHjh2dGxwAACgTl6hsVKlSRWfPnpUkffjhh+rSpYskydfX97pnpgAAUJF4eFgcclVULlHZ6Nq1q5544gndfvvt+vbbb3Xvvde28z148KBq1qzp3OAAACijijwE4gguUdn429/+ptatW+vMmTNavXq1qlWrJknatWuXBgwY4OToAABAWbhEZeOmm27SvHnzirT/3iFsAABUBBV5JYkjuESyIUnnz5/XwoULdfjwYVksFtWvX19Dhw5VcHCws0MDAKBM3DzXcI1hlJ07d6p27dqaNWuWzp07p59++kmzZs1S7dq1tXv3bmeHBwBAmbj7PhsuUdl46qmndP/992vBggXy8roWUkFBgZ544gklJCTo008/dXKEAACgtFwi2di5c6ddoiFJXl5eSkxMVPPmzZ0YGQAAZVeRqxKO4BLDKEFBQfrhhx+KtJ88eVKBgYFOiAgAAMdx9x1EXSLZ6Nevn4YOHapVq1bp5MmTOnXqlFauXKknnniCpa8AAFRwLjGM8sorr8jDw0OPPvqoCgoKJEne3t4aOXKkXn75ZSdHBwBA2bj7MIpTk42cnBxNmjRJ69atU35+vnr37q0xY8YoODhYderUkb+/vzPDAwDAIdw813BusvH8889r8eLFio+Pl5+fn1asWKGrV6/q7bffdmZYAADAgZyabKxZs0YLFy5U//79JUnx8fFq27atCgsL5enp6czQAABwGHcfRnHqBNGTJ0/qrrvusr2+88475eXlpdOnTzsxKgAAHIvVKE5UWFgoHx8fuzYvLy/bJFEAAFDxOXUYxTAMDRkyRFar1dZ2+fJljRgxQgEBAba2NWvWOCM8AAAcwt2HUZyabAwePLhI26BBg5wQCQAA5nHzXMO5ycaiRYuc+fEAAJQLd69suMQOogAAoPJyiR1EAQCozNy8sEGyAQCA2RhGAQAAMBGVDQAATObmhQ2SDQAAzMYwCgAAgImobAAAYDI3L2yQbAAAYDaGUQAAAExEZQMAAJO5e2WDZAMAAJO5ea7BMAoAAGazWCwOuUoiOTlZLVq0UGBgoMLDw9W7d28dOXLEro9hGJo8ebKio6Pl5+enjh076uDBg3Z98vLyNHbsWIWGhiogIED333+/Tp06VaJYSDYAAKiEtm7dqtGjR+vLL7/Upk2bVFBQoG7duunSpUu2PjNmzNDMmTM1b9487dixQ5GRkeratasuXrxo65OQkKC1a9dq5cqVSklJUXZ2tnr16qXCwsJix2IxDMNw6LdzAY+u2OfsEACX9I++TZwdAuByfMthQkGnOdsc8pzNf2xT6veeOXNG4eHh2rp1q9q3by/DMBQdHa2EhAQ9/fTTkq5VMSIiIjR9+nQ9+eSTysrKUlhYmJYtW6Z+/fpJkk6fPq2YmBht2LBB3bt3L9ZnU9kAAMBkjhpGycvL04ULF+yuvLy8YsWQlZUlSQoJCZEkpaamKj09Xd26dbP1sVqt6tChg7Ztu5Yc7dq1S/n5+XZ9oqOj1ahRI1uf4iDZAACggkhOTlZwcLDdlZyc/LvvMwxD48ePV7t27dSoUSNJUnp6uiQpIiLCrm9ERITtXnp6unx8fFS1atUb9ikOVqMAAGAyR61GSUpK0vjx4+3arFbr775vzJgx2rdvn1JSUq4Tm31whmH87mTU4vT5JZINAABM5uGgbMNqtRYrufilsWPH6t1339Wnn36qm2++2dYeGRkp6Vr1IioqytaekZFhq3ZERkbqypUryszMtKtuZGRkqE2b4s8fYRgFAIBKyDAMjRkzRmvWrNEnn3yiuLg4u/txcXGKjIzUpk2bbG1XrlzR1q1bbYlEs2bN5O3tbdcnLS1NBw4cKFGyQWUDAACTOWNTr9GjR2vFihV65513FBgYaJtjERwcLD8/P1ksFiUkJGjatGmqW7eu6tatq2nTpsnf318DBw609R06dKgmTJigatWqKSQkRBMnTlTjxo3VpUuXYsdCsgEAgMmcsV35/PnzJUkdO3a0a1+0aJGGDBkiSUpMTFRubq5GjRqlzMxMtWzZUh9++KECAwNt/WfNmiUvLy/17dtXubm5uvvuu7V48WJ5enoWOxb22QDcCPtsAEWVxz4bPeZ/5ZDnbBzZ0iHPKW/M2QAAAKZiGAUAAJNx6isAADCVm+caDKMAAABzUdkAAMBkFrl3aYNkAwAAk3m4d67BMAoAADAXlQ0AAEzGahQAAGAqN881GEYBAADmorIBAIDJHHXEfEVFsgEAgMncPNcofrKxb1/xDzdr0oTDngAA+BkTRIvptttuk8VikWEYv/tLKywsLHNgAACgcij2BNHU1FQdO3ZMqampWr16teLi4vT6669rz5492rNnj15//XXVrl1bq1evNjNeAAAqHIvFMVdFVezKRmxsrO3nhx9+WK+99pp69uxpa2vSpIliYmL03HPPqXfv3g4NEgCAiszdJ4iWaunr/v37FRcXV6Q9Li5Ohw4dKnNQAACg8ihVslG/fn29+OKLunz5sq0tLy9PL774ourXr++w4AAAqAwsDroqqlItfX3jjTd03333KSYmRk2bNpUkff3117JYLHr//fcdGiAAABUdq1FK4c4771Rqaqr+9a9/6ZtvvpFhGOrXr58GDhyogIAAR8cIAAAqsFJv6uXv76/hw4c7MhYAAColjpgvpWXLlqldu3aKjo7WiRMnJEmzZs3SO++847DgAACoDCwWi0OuiqpUycb8+fM1fvx49ejRQ5mZmbZNvKpWrarZs2c7Mj4AAFDBlSrZmDt3rhYsWKBnn31WXl7/NxLTvHlz7d+/32HBAQBQGbCpVymkpqbq9ttvL9JutVp16dKlMgcFAEBlUpGHQByhVJWNuLg47d27t0j7xo0b1aBBg7LGBABApeJhccxVUZWqsjFp0iSNHj1aly9flmEY2r59u/79738rOTlZ//znPx0dIwAAqMBKlWw89thjKigoUGJionJycjRw4EBVr15dc+bMUf/+/R0dIwAAFZq7D6OUep+NYcOGadiwYfrpp5909epVhYeHOzIuAAAqDfdONUo5Z6Nz5846f/68JCk0NNSWaFy4cEGdO3d2WHAAAKDiK1VlY8uWLbpy5UqR9suXL+uzzz4rc1AAAFQm7n7EfImSjX379tl+PnTokNLT022vCwsL9cEHH6h69eqOiw4AgErAzXONkiUbt912m23L1OsNl/j5+Wnu3LkOCw4AAFR8JUo2UlNTZRiGatWqpe3btyssLMx2z8fHR+Hh4fL09HR4kAAAVGSsRimB2NhYSdLVq1dNCQYAgMrIzXON0q1GSU5O1ptvvlmk/c0339T06dPLHBQAAKg8SpVs/P3vf9ett95apL1hw4Z64403yhwUAACViYfF4pCroirV0tf09HRFRUUVaQ8LC1NaWlqZgwIAoDKpwHmCQ5SqshETE6PPP/+8SPvnn3+u6OjoMgcFAEBl8vNKzrJeFVWpKhtPPPGEEhISlJ+fb1sC+/HHHysxMVETJkxwaIAAAKBiK1WykZiYqHPnzmnUqFG2nUR9fX319NNPKykpyaEBlsbf+jRydgiAS6raYoyzQwBcTu6eeaZ/RqmGESqRUiUbFotF06dP13PPPafDhw/Lz89PdevWldVqdXR8AABUeBV5CMQRSn3qqyRVqVJFLVq0cFQsAACgEip2stGnTx8tXrxYQUFB6tOnz2/2XbNmTZkDAwCgsvBw78JG8ZON4OBgWxkoODjYtIAAAKhsSDaKadGiRdf9GQAA4LeUac4GAAD4fUwQLabbb7+92L+s3bt3lzogAAAqG4ZRiql37962ny9fvqzXX39dDRo0UOvWrSVJX375pQ4ePKhRo0Y5PEgAAFBxFTvZeP75520/P/HEExo3bpymTp1apM/JkycdFx0AAJWAm4+ilG5Ts7fffluPPvpokfZBgwZp9erVZQ4KAIDKxFmnvn766ae67777FB0dLYvFonXr1tndHzJkSJHzV1q1amXXJy8vT2PHjlVoaKgCAgJ0//3369SpUyX7/iWOXJKfn59SUlKKtKekpMjX17c0jwQAoNLycNBVUpcuXVLTpk01b96Nt2S/5557lJaWZrs2bNhgdz8hIUFr167VypUrlZKSouzsbPXq1UuFhYXFjqNUq1ESEhI0cuRI7dq1y5YBffnll3rzzTf1l7/8pTSPBAAAvyMvL095eXl2bVar9YbHhfTo0UM9evT4zWdarVZFRkZe915WVpYWLlyoZcuWqUuXLpKkf/3rX4qJidFHH32k7t27FyvuUlU2/vSnP2np0qXas2ePxo0bp3HjxmnPnj1avHix/vSnP5XmkQAAVFoWi2Ou5ORkBQcH213Jycllim3Lli0KDw9XvXr1NGzYMGVkZNju7dq1S/n5+erWrZutLTo6Wo0aNdK2bduK/Rml3mejb9++6tu3b2nfDgCA2yjNfIvrSUpK0vjx4+3aynIIao8ePfTwww8rNjZWqampeu6559S5c2ft2rVLVqtV6enp8vHxUdWqVe3eFxERofT09GJ/TqmTjfPnz+s///mPjh07pokTJyokJES7d+9WRESEqlevXtrHAgCAG/itIZPS6Nevn+3nRo0aqXnz5oqNjdX69et/8xw0wzBKtFFZqZKNffv2qUuXLgoODtbx48f1xBNPKCQkRGvXrtWJEye0dOnS0jwWAIBKqaIsfY2KilJsbKyOHj0qSYqMjNSVK1eUmZlpV93IyMhQmzZtiv3cUs3ZGD9+vIYMGaKjR4/arT7p0aOHPv3009I8EgCASsvD4pjLbGfPntXJkycVFRUlSWrWrJm8vb21adMmW5+0tDQdOHCgRMlGqSobO3bs0N///vci7dWrVy/RGA4AADBPdna2vvvuO9vr1NRU7d27VyEhIQoJCdHkyZP14IMPKioqSsePH9czzzyj0NBQ/eEPf5B07ZT3oUOHasKECapWrZpCQkI0ceJENW7c2LY6pThKlWz4+vrqwoULRdqPHDmisLCw0jwSAIBKy1ETREtq586d6tSpk+31z5NLBw8erPnz52v//v1aunSpzp8/r6ioKHXq1EmrVq1SYGCg7T2zZs2Sl5eX+vbtq9zcXN19991avHixPD09ix2HxTAMo6TBDx8+XGfOnNFbb72lkJAQ7du3T56enurdu7fat2+v2bNnl/SRDnXx8lWnfj7gqsJbj3N2CIDLyd1z4w2vHGXqR9/9fqdieK5LHYc8p7yVas7GK6+8ojNnzig8PFy5ubnq0KGD6tSpo8DAQL300kuOjhEAAFRgpRpGCQoKUkpKij755BPt3r1bV69e1R133FGi8RsAANwFR8yXUEFBgXx9fbV371517txZnTt3NiMuAAAqDYvcO9socbLh5eWl2NjYEh3AAgCAO3P3ykap5mz8+c9/VlJSks6dO+foeAAAQCVTqjkbr732mr777jtFR0crNjZWAQEBdvd3797tkOAAAKgM3L2yUapko3fv3rJYLCrFqlkAANxOSc4RqYxKlGzk5ORo0qRJWrdunfLz83X33Xdr7ty5Cg0NNSs+AABQwZVozsbzzz+vxYsX695779WAAQP00UcfaeTIkWbFBgBApVBRzkYxS4kqG2vWrNHChQvVv39/SVJ8fLzatm2rwsLCEm1bCgCAO3HzUZSSVTZOnjypu+66y/b6zjvvlJeXl06fPu3wwAAAQOVQospGYWGhfHx87B/g5aWCggKHBgUAQGXirIPYXEWJkg3DMDRkyBBZrVZb2+XLlzVixAi75a9r1qxxXIQAAFRwFXm+hSOUKNkYPHhwkbZBgwY5LBgAAFD5lCjZWLRokVlxAABQabn5KErpNvUCAADF58FBbAAAwEzuXtko1UFsAAAAxUVlAwAAk7EaBQAAmMrd99lgGAUAAJiKygYAACZz88IGyQYAAGZjGAUAAMBEVDYAADCZmxc2SDYAADCbuw8juPv3BwAAJqOyAQCAySxuPo5CsgEAgMncO9Ug2QAAwHQsfQUAADARlQ0AAEzm3nUNkg0AAEzn5qMoDKMAAABzUdkAAMBkLH0FAACmcvdhBHf//gAAwGRUNgAAMBnDKAAAwFTunWowjAIAAExGZQMAAJMxjAIAAEzl7sMIJBsAAJjM3Ssb7p5sAQAAk7lMsvHZZ59p0KBBat26tX788UdJ0rJly5SSkuLkyAAAKBuLg66KyiWSjdWrV6t79+7y8/PTnj17lJeXJ0m6ePGipk2b5uToAAAoG4vFMVdF5RLJxosvvqg33nhDCxYskLe3t629TZs22r17txMjAwAAZeUSE0SPHDmi9u3bF2kPCgrS+fPnyz8gAAAcyKNCD4KUnUtUNqKiovTdd98VaU9JSVGtWrWcEBEAAI7DMIoLePLJJ/XHP/5RX331lSwWi06fPq3ly5dr4sSJGjVqlLPDAwCgQvr000913333KTo6WhaLRevWrbO7bxiGJk+erOjoaPn5+aljx446ePCgXZ+8vDyNHTtWoaGhCggI0P33369Tp06VKA6XSDYSExPVu3dvderUSdnZ2Wrfvr2eeOIJPfnkkxozZoyzwwMAoEwsDvpPSV26dElNmzbVvHnzrnt/xowZmjlzpubNm6cdO3YoMjJSXbt21cWLF219EhIStHbtWq1cuVIpKSnKzs5Wr169VFhYWPzvbxiGUeLoTZKTk6NDhw7p6tWratCggapUqVKq51y8fNXBkQGVQ3jrcc4OAXA5uXuu/3/EjrThYIZDntOzYXip32uxWLR27Vr17t1b0rWqRnR0tBISEvT0009LulbFiIiI0PTp0/Xkk08qKytLYWFhWrZsmfr16ydJOn36tGJiYrRhwwZ17969WJ/tEpWNJUuW6NKlS/L391fz5s115513ljrRAACgssrLy9OFCxfsrp+3iyip1NRUpaenq1u3brY2q9WqDh06aNu2bZKkXbt2KT8/365PdHS0GjVqZOtTHC6RbEycOFHh4eHq37+/3n//fRUUFDg7JAAAHMZDFodcycnJCg4OtruSk5NLFVN6erokKSIiwq49IiLCdi89PV0+Pj6qWrXqDfsU7/u7gLS0NK1atUqenp7q37+/oqKiNGrUqBJlTQAAuCpHrUZJSkpSVlaW3ZWUlFTG2OznghiG8btnuRSnzy+5RLLh5eWlXr16afny5crIyNDs2bN14sQJderUSbVr13Z2eAAAlImjkg2r1aqgoCC7y2q1liqmyMhISSpSocjIyLBVOyIjI3XlyhVlZmbesE9xuESy8Uv+/v7q3r27evToobp16+r48ePODgkAgEonLi5OkZGR2rRpk63typUr2rp1q9q0aSNJatasmby9ve36pKWl6cCBA7Y+xeESO4hK11airF27VsuXL9dHH32kmJgYDRgwQG+//bazQwMAoExKs2zVEbKzs+02zUxNTdXevXsVEhKiGjVqKCEhQdOmTVPdunVVt25dTZs2Tf7+/ho4cKAkKTg4WEOHDtWECRNUrVo1hYSEaOLEiWrcuLG6dOlS7DhcItkYMGCA3nvvPfn7++vhhx/Wli1bSpQxAQDgyjyctPvnzp071alTJ9vr8ePHS5IGDx6sxYsXKzExUbm5uRo1apQyMzPVsmVLffjhhwoMDLS9Z9asWfLy8lLfvn2Vm5uru+++W4sXL5anp2ex43CJfTYGDhyo+Ph4de/eXV5eZc9/2GcDuD722QCKKo99Nj7+5ieHPOfuW0Md8pzy5hKVjRUrVjg7BAAATOOsYRRX4bRk47XXXtPw4cPl6+ur11577Tf7jhvHv40BACquinyImiM4bRglLi5OO3fuVLVq1RQXF3fDfhaLRceOHSvRsxlGAa6PYRSgqPIYRtl85KxDntPplmoOeU55c1plIzU19bo/AwBQ2bj7MIpL7LPxwgsvKCcnp0h7bm6uXnjhBSdEBACA43hYHHNVVC6RbEyZMkXZ2dlF2nNycjRlyhQnRAQAABzFJVaj3GiP9a+//lohISFOiAglsXvXDi1b/KYOHz6on86c0Suz5qpj5//b7KV50/rXfd+4pybq0SFDyytMwFTDHm6nYQ/dpdjoa//MOnwsXdP+sVEffn5IkvRA56Ya+mA73V4/RqFVq6hlv2Tt+/bHGz5v3byR6t62ofo+9Q+9t2VfuXwHmMfdh1GcmmxUrVpVFotFFotF9erVs0s4CgsLlZ2drREjRjgxQhRHbm6u6t5yi+574A9KnPDHIvc/+PhTu9fbUj7T1Ml/Vucu3Yr0BSqqH/93Xs/NfUff/3BtP4VB97XU27OGq1X/l3X4WLr8/Xz0xdffa81HuzX/L/G/+ayx8Z3k/B2Q4EjuvhrFqcnG7NmzZRiGHn/8cU2ZMkXBwcG2ez4+PqpZs6Zat27txAhRHG3btVfbdu1veD80NMzu9dYtn6h5i5a6+eYYs0MDys2GTw/YvZ78t/c07OF2urNJnA4fS9e/1++QJNWI+u1qbeN61TVuUGe1GzRDxz8q3dHhcD1unms4N9kYPHiwpGvLYNu0aSNvb29nhoNycPbsT0r5bKumTOUfoqi8PDwserDrHQrw89FX+4q/2s7P11tLkofoqelv6X9nL5oYIVC+XGLORocOHWw/5+bmKj8/3+5+UFDQDd+bl5envLw8u7Yrhnepj9yFud5/d50C/APU6e6uzg4FcLiGdaK1ZckE+fp4KTs3T/0mLNA3x9J//43/34wJD+rLr1P1/pb9JkYJZ/Bw83EUl1iNkpOTozFjxig8PFxVqlRR1apV7a7fkpycrODgYLvr1b++XE6Ro6TeXbdG9/TsRTKISunb4/9Ty/7J6jD4VS14O0ULXnhEt9aKLNZ77+3QWB3vrKdJf/2PyVHCGSwOuioql0g2Jk2apE8++USvv/66rFar/vnPf2rKlCmKjo7W0qVLf/O9SUlJysrKsrsmTPpTOUWOktize6dOHE9V7z4POTsUwBT5BYU6dvIn7T70g/4y913t//ZHjR7QsVjv7diinmrdHKr0T/+qizvm6OKOOZKkf7/yhP67oOjEa6AicYlhlPfee09Lly5Vx44d9fjjj+uuu+5SnTp1FBsbq+XLlys+/sYzt61Wa5F/S2a7ctf0ztrVqt+goerdcquzQwHKhUUWWX2K94/ZVxZ9qEVrt9m17frPs0p8dbXWbz1wg3ehwqjIZQkHcIlk49y5c7bzUYKCgnTu3DlJUrt27TRy5EhnhoZiyMm5pJM//GB7/eOPp3Tkm8MKDg5WZFS0JCk7O1sfffhfJUxIdFaYgKmmjLlPH35+SCfTMxUY4KuHuzdT++Z1df/o1yVJVYP8FRNZVVHh11bd1asZIUn639kL+t/Zi7br106mZerEacecqwHnYZ8NF1CrVi0dP35csbGxatCggd566y3deeedeu+993TTTTc5Ozz8jkMHD2rEE4Ntr2e9Ml2S1Ov+3pr8/1edfPjBBhkydE+Pe50SI2C28GqBWvjio4oMDVJW9mUdOPqj7h/9uj756htJ1+ZkLHjhEVv/ZdMflyS9+MYGvfT3DU6JGSgvTjv19ZdmzZolT09PjRs3Tps3b9a9996rwsJCFRQUaObMmfrjH0s2XskwCnB9nPoKFFUep75uP5blkOfcWSv49zu5IJeobDz11FO2nzt16qRvvvlGO3fuVO3atdW0aVMnRgYAQNm59yCKiyQbv1ajRg3VqFHD2WEAAAAHcIlk47XXXrtuu8Vika+vr+rUqaP27dvL09OznCMDAMAB3Ly04RLJxqxZs3TmzBnl5OSoatWqMgxD58+fl7+/v6pUqaKMjAzVqlVLmzdvVkwM52kAACoWd1+N4hKbek2bNk0tWrTQ0aNHdfbsWZ07d07ffvutWrZsqTlz5uiHH35QZGSk3dwOAAAqCovFMVdF5RKrUWrXrq3Vq1frtttus2vfs2ePHnzwQR07dkzbtm3Tgw8+qLS0tN99HqtRgOtjNQpQVHmsRtl1/IJDntOs5o3PCnNlLjGMkpaWpoKCgiLtBQUFSk+/dohRdHS0Ll7kFEQAQMVTgYsSDuESwyidOnXSk08+qT179tja9uzZo5EjR6pz586SpP3799t2GQUAoEJx85PYXCLZWLhwoUJCQtSsWTPbWSfNmzdXSEiIFi5cKEmqUqWKXn31VSdHCgAASsolhlEiIyO1adMmffPNN/r2229lGIZuvfVW3XLLLbY+nTp1cmKEAACUnruvRnGJZONntWrVksViUe3ateXl5VKhAQBQahV5JYkjuMQwSk5OjoYOHSp/f381bNhQP/z/E0THjRunl19+2cnRAQCAsnCJZCMpKUlff/21tmzZIl9fX1t7ly5dtGrVKidGBgBA2bn5/FDXGEZZt26dVq1apVatWsnyi1pTgwYN9P333zsxMgAAHKAiZwoO4BKVjTNnzig8PLxI+6VLl+ySDwAAUPG4RLLRokULrV+/3vb65wRjwYIFat26tbPCAgDAISwO+k9F5RLDKMnJybrnnnt06NAhFRQUaM6cOTp48KC++OILbd261dnhAQBQJu5epHeJykabNm30+eefKycnR7Vr19aHH36oiIgIffHFF2rWrJmzwwMAoEyYIOoiGjdurCVLljg7DAAA4GBOTTY8PDx+dwKoxWK57iFtAABUGBW5LOEATk021q5de8N727Zt09y5c2UYRjlGBACA41XkyZ2O4NRk44EHHijS9s033ygpKUnvvfee4uPjNXXqVCdEBgAAHMUlJohK0unTpzVs2DA1adJEBQUF2rt3r5YsWaIaNWo4OzQAAMrEYnHMVVE5PdnIysrS008/rTp16ujgwYP6+OOP9d5776lRo0bODg0AAIdgNYoTzZgxQ9OnT1dkZKT+/e9/X3dYBQAAVGwWw4kzMD08POTn56cuXbrI09Pzhv3WrFlToudevHy1rKEBlVJ463HODgFwObl75pn+GYfTLjnkOfWjAhzynPLm1MrGo48+ytknAIBKj9UoTrR48WJnfjwAACgHLrODKAAAlZW7F/FJNgAAMJmb5xokGwAAmM7Nsw2n77MBAAAqN5INAABMZnHQf0pi8uTJslgsdldkZKTtvmEYmjx5sqKjo+Xn56eOHTvq4MGDjv7qkkg2AAAwnbO2K2/YsKHS0tJs1/79+233ZsyYoZkzZ2revHnasWOHIiMj1bVrV128eNGB3/wakg0AACopLy8vRUZG2q6wsDBJ16oas2fP1rPPPqs+ffqoUaNGWrJkiXJycrRixQqHx0GyAQCAyRx1NkpeXp4uXLhgd+Xl5d3wc48eParo6GjFxcWpf//+OnbsmCQpNTVV6enp6tatm62v1WpVhw4dtG3bNgd/e5INAADM56BsIzk5WcHBwXZXcnLydT+yZcuWWrp0qf773/9qwYIFSk9PV5s2bXT27Fmlp6dLkiIiIuzeExERYbvnSCx9BQCggkhKStL48ePt2qxW63X79ujRw/Zz48aN1bp1a9WuXVtLlixRq1atJKnIkSGGYZhyjAiVDQAATOao1ShWq1VBQUF2142SjV8LCAhQ48aNdfToUduqlF9XMTIyMopUOxyBZAMAAJM5azXKL+Xl5enw4cOKiopSXFycIiMjtWnTJtv9K1euaOvWrWrTpk0Zv21RDKMAAFAJTZw4Uffdd59q1KihjIwMvfjii7pw4YIGDx4si8WihIQETZs2TXXr1lXdunU1bdo0+fv7a+DAgQ6PhWQDAACTOWO38lOnTmnAgAH66aefFBYWplatWunLL79UbGysJCkxMVG5ubkaNWqUMjMz1bJlS3344YcKDAx0eCwWwzAMhz/VyS5evursEACXFN56nLNDAFxO7p55pn/G8bOXHfKcmtV8HfKc8kZlAwAAk5V0q/HKhgmiAADAVFQ2AAAwmQlbV1QoJBsAAJjMzXMNhlEAAIC5qGwAAGAyhlEAAIDJ3DvbYBgFAACYisoGAAAmYxgFAACYys1zDYZRAACAuahsAABgMoZRAACAqdz9bBSSDQAAzObeuQZzNgAAgLmobAAAYDI3L2yQbAAAYDZ3nyDKMAoAADAVlQ0AAEzGahQAAGAu9841GEYBAADmorIBAIDJ3LywQbIBAIDZWI0CAABgIiobAACYjNUoAADAVAyjAAAAmIhkAwAAmIphFAAATObuwygkGwAAmMzdJ4gyjAIAAExFZQMAAJMxjAIAAEzl5rkGwygAAMBcVDYAADCbm5c2SDYAADAZq1EAAABMRGUDAACTsRoFAACYys1zDZINAABM5+bZBnM2AACAqahsAABgMndfjUKyAQCAydx9gijDKAAAwFQWwzAMZweByikvL0/JyclKSkqS1Wp1djiAy+BvA+6GZAOmuXDhgoKDg5WVlaWgoCBnhwO4DP424G4YRgEAAKYi2QAAAKYi2QAAAKYi2YBprFarnn/+eSbAAb/C3wbcDRNEAQCAqahsAAAAU5FsAAAAU5FsAAAAU5FswOGOHz8ui8WivXv3/ma/jh07KiEhoVxiAiqymjVravbs2c4OAyg1kg03NmTIEFksFlksFnl7e6tWrVqaOHGiLl26VKbnxsTEKC0tTY0aNZIkbdmyRRaLRefPn7frt2bNGk2dOrVMnwWU1c9/By+//LJd+7p162Qp59OzFi9erJtuuqlI+44dOzR8+PByjQVwJJINN3fPPfcoLS1Nx44d04svvqjXX39dEydOLNMzPT09FRkZKS+v3z5UOCQkRIGBgWX6LMARfH19NX36dGVmZjo7lOsKCwuTv7+/s8MASo1kw81ZrVZFRkYqJiZGAwcOVHx8vNatW6e8vDyNGzdO4eHh8vX1Vbt27bRjxw7b+zIzMxUfH6+wsDD5+fmpbt26WrRokST7YZTjx4+rU6dOkqSqVavKYrFoyJAhkuyHUZKSktSqVasi8TVp0kTPP/+87fWiRYtUv359+fr66tZbb9Xrr79u0m8G7qRLly6KjIxUcnLyDfts27ZN7du3l5+fn2JiYjRu3Di7KmBaWpruvfde+fn5KS4uTitWrCgy/DFz5kw1btxYAQEBiomJ0ahRo5SdnS3pWgXwscceU1ZWlq3iOHnyZEn2wygDBgxQ//797WLLz89XaGio7W/QMAzNmDFDtWrVkp+fn5o2bar//Oc/DvhNAaVDsgE7fn5+ys/PV2JiolavXq0lS5Zo9+7dqlOnjrp3765z585Jkp577jkdOnRIGzdu1OHDhzV//nyFhoYWeV5MTIxWr14tSTpy5IjS0tI0Z86cIv3i4+P11Vdf6fvvv7e1HTx4UPv371d8fLwkacGCBXr22Wf10ksv6fDhw5o2bZqee+45LVmyxIxfBdyIp6enpk2bprlz5+rUqVNF7u/fv1/du3dXnz59tG/fPq1atUopKSkaM2aMrc+jjz6q06dPa8uWLVq9erX+8Y9/KCMjw+45Hh4eeu2113TgwAEtWbJEn3zyiRITEyVJbdq00ezZsxUUFKS0tDSlpaVdt8oYHx+vd99915akSNJ///tfXbp0SQ8++KAk6c9//rMWLVqk+fPn6+DBg3rqqac0aNAgbd261SG/L6DEDLitwYMHGw888IDt9VdffWVUq1bNeOihhwxvb29j+fLltntXrlwxoqOjjRkzZhiGYRj33Xef8dhjj133uampqYYkY8+ePYZhGMbmzZsNSUZmZqZdvw4dOhh//OMfba+bNGlivPDCC7bXSUlJRosWLWyvY2JijBUrVtg9Y+rUqUbr1q1L8rUBO7/8O2jVqpXx+OOPG4ZhGGvXrjV+/kfkI488YgwfPtzufZ999pnh4eFh5ObmGocPHzYkGTt27LDdP3r0qCHJmDVr1g0/+6233jKqVatme71o0SIjODi4SL/Y2Fjbc65cuWKEhoYaS5cutd0fMGCA8fDDDxuGYRjZ2dmGr6+vsW3bNrtnDB061BgwYMBv/zIAk1DZcHPvv/++qlSpIl9fX7Vu3Vrt27fX2LFjlZ+fr7Zt29r6eXt7684779Thw4clSSNHjtTKlSt12223KTExUdu2bStzLPHx8Vq+fLmka2Xgf//737aqxpkzZ3Ty5EkNHTpUVapUsV0vvviiXTUEKIvp06dryZIlOnTokF37rl27tHjxYrv/7XXv3l1Xr15Vamqqjhw5Ii8vL91xxx2299SpU0dVq1a1e87mzZvVtWtXVa9eXYGBgXr00Ud19uzZEk3K9vb21sMPP2z7W7l06ZLeeecd29/KoUOHdPnyZXXt2tUu3qVLl/K3Aqf57Rl8qPQ6deqk+fPny9vbW9HR0fL29tbXX38tSUVm4huGYWvr0aOHTpw4ofXr1+ujjz7S3XffrdGjR+uVV14pdSwDBw7Un/70J+3evVu5ubk6efKkbWz66tWrkq4NpbRs2dLufZ6enqX+TOCX2rdvr+7du+uZZ56xzS2Srv3v78knn9S4ceOKvKdGjRo6cuTIdZ9n/OI0iBMnTqhnz54aMWKEpk6dqpCQEKWkpGjo0KHKz88vUZzx8fHq0KGDMjIytGnTJvn6+qpHjx62WCVp/fr1ql69ut37OIsFzkKy4eYCAgJUp04du7Y6derIx8dHKSkpGjhwoKRrE9B27txpty9GWFiYhgwZoiFDhuiuu+7SpEmTrpts+Pj4SJIKCwt/M5abb75Z7du31/Lly5Wbm6suXbooIiJCkhQREaHq1avr2LFjtn+DA8zw8ssv67bbblO9evVsbXfccYcOHjxY5G/lZ7feeqsKCgq0Z88eNWvWTJL03Xff2S333rlzpwoKCvTqq6/Kw+NaUfmtt96ye46Pj8/v/p1I1+Z3xMTEaNWqVdq4caMefvhh299ZgwYNZLVa9cMPP6hDhw4l+u6AWUg2UERAQIBGjhypSZMmKSQkRDVq1NCMGTOUk5OjoUOHSpL+8pe/qFmzZmrYsKHy8vL0/vvvq379+td9XmxsrCwWi95//3317NlTfn5+qlKlynX7xsfHa/Lkybpy5YpmzZpld2/y5MkaN26cgoKC1KNHD+Xl5Wnnzp3KzMzU+PHjHftLgNtq3Lix4uPjNXfuXFvb008/rVatWmn06NEaNmyYAgICdPjwYW3atElz587Vrbfeqi5dumj48OG2SuGECRPk5+dnqwbWrl1bBQUFmjt3ru677z59/vnneuONN+w+u2bNmsrOztbHH3+spk2byt/f/7pLXi0WiwYOHKg33nhD3377rTZv3my7FxgYqIkTJ+qpp57S1atX1a5dO124cEHbtm1TlSpVNHjwYJN+c8BvcPKcETjRryeI/lJubq4xduxYIzQ01LBarUbbtm2N7du32+5PnTrVqF+/vuHn52eEhIQYDzzwgHHs2DHDMIpOEDUMw3jhhReMyMhIw2KxGIMHDzYMo+gEUcMwjMzMTMNqtRr+/v7GxYsXi8S1fPly47bbbjN8fHyMqlWrGu3btzfWrFlTpt8D3Nv1/g6OHz9uWK1W45f/iNy+fbvRtWtXo0qVKkZAQIDRpEkT46WXXrLdP336tNGjRw/DarUasbGxxooVK4zw8HDjjTfesPWZOXOmERUVZfj5+Rndu3c3li5dWmTy9IgRI4xq1aoZkoznn3/eMAz7CaI/O3jwoCHJiI2NNa5evWp37+rVq8acOXOMW265xfD29jbCwsKM7t27G1u3bi3bLwsoJY6YBwATnDp1SjExMbY5TYA7I9kAAAf45JNPlJ2drcaNGystLU2JiYn68ccf9e2338rb29vZ4QFOxZwNAHCA/Px8PfPMMzp27JgCAwPVpk0bLV++nEQDEJUNAABgMjb1AgAApiLZAAAApiLZAAAApiLZAAAApiLZAAAApiLZAFBiFotF69atc3YYACoIkg3AxW3btk2enp665557SvS+mjVravbs2eYEBQAlQLIBuLg333xTY8eOVUpKin744QdnhwMAJUayAbiwS5cu6a233tLIkSPVq1cvLV682O7+u+++q+bNm8vX11ehoaHq06ePJKljx446ceKEnnrqKVksFtvJo5MnT9Ztt91m94zZs2erZs2attc7duxQ165dFRoaquDgYHXo0EG7d+8282sCqORINgAXtmrVKt1yyy265ZZbNGjQIC1atEg/b/q7fv169enTR/fee6/27Nmjjz/+WM2bN5ckrVmzRjfffLNeeOEFpaWlKS0trdifefHiRQ0ePFifffaZvvzyS9WtW1c9e/bUxYsXTfmOACo/zkYBXNjChQs1aNAgSdI999yj7Oxsffzxx+rSpYteeukl9e/fX1OmTLH1b9q0qSQpJCREnp6eCgwMVGRkZIk+s3Pnznav//73v6tq1araunWrevXqVcZvBMAdUdkAXNSRI0e0fft29e/fX5Lk5eWlfv366c0335Qk7d2715SjyzMyMjRixAjVq1dPwcHBCg4OVnZ2NvNFAJQalQ3ARS1cuFAFBQWqXr26rc0wDHl7eyszM1N+fn4lfqaHh4d+ffZifn6+3eshQ4bozJkzmj17tmJjY2W1WtW6dWtduXKldF8EgNujsgG4oIKCAi1dulSvvvqq9u7da7u+/vprxcbGavny5WrSpIk+/vjjGz7Dx8dHhYWFdm1hYWFKT0+3Szj27t1r1+ezzz7TuHHj1LNnTzVs2FBWq1U//fSTQ78fAPdCZQNwQe+//74yMzM1dOhQBQcH29176KGHtHDhQs2aNUt33323ateurf79+6ugoEAbN25UYmKipGv7bHz66afq37+/rFarQkND1bFjR505c0YzZszQQw89pA8++EAbN25UUFCQ7fl16tTRsmXL1Lx5c124cEGTJk0qVRUFAH5GZQNwQQsXLlSXLl2KJBqS9OCDD2rv3r0KCgrS22+/rXfffVe33XabOnfurK+++srW74UXXtDx48dVu3ZthYWFSZLq16+v119/XX/729/UtGlTbd++XRMnTrR7/ptvvqnMzEzdfvvteuSRRzRu3DiFh4eb+4UBVGoW49cDuAAAAA5EZQMAAJiKZAMAAJiKZAMAAJiKZAMAAJiKZAMAAJiKZAMAAJiKZAMAAJiKZAMAAJiKZAMAAJiKZAMAAJiKZAMAAJjq/wHlW2oFB8ggZAAAAABJRU5ErkJggg==", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "conf_mat = confusion_matrix(y_test, y_pred_rf, labels=[1, 0]).transpose() # Transpose the sklearn confusion matrix to match the convention in the lecture\n", "sns.heatmap(conf_mat, annot=True, cmap='Blues', fmt='g', xticklabels=['Positive', 'Negative'], yticklabels=['Positive', 'Negative'])\n", "plt.xlabel(\"Actual\")\n", "plt.ylabel(\"Predicted\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using Pre-trained Models from Hugging Face for Sentiment Analysis\n", "\n", "Instead of using the sentence embeddings as input features for a separate machine learning model, we can also directly use a pre-trained model from Hugging Face that is fine-tuned for sentiment analysis." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:16:06.262803Z", "iopub.status.busy": "2026-02-23T20:16:06.262620Z", "iopub.status.idle": "2026-02-23T20:16:07.449391Z", "shell.execute_reply": "2026-02-23T20:16:07.448470Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).\n", "Using a pipeline without specifying a model name and revision in production is not recommended.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Device set to use mps:0\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[{'label': 'NEGATIVE', 'score': 0.9997608065605164}]\n" ] } ], "source": [ "analyzer = pipeline(\"sentiment-analysis\") \n", "result = analyzer(\"The ECB's monetary policy is very ineffective for stabilizing the economy.\")\n", "print(result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This uses a pre-trained model that has been fine-tuned on a large dataset for sentiment analysis. The output will show the predicted label (e.g., \"NEGATIVE\") and the confidence score for that prediction. To see which model is being used, we can check the default model used by the pipeline" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:16:07.451928Z", "iopub.status.busy": "2026-02-23T20:16:07.451743Z", "iopub.status.idle": "2026-02-23T20:16:07.455366Z", "shell.execute_reply": "2026-02-23T20:16:07.454559Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DistilBertForSequenceClassification(\n", " (distilbert): DistilBertModel(\n", " (embeddings): Embeddings(\n", " (word_embeddings): Embedding(30522, 768, padding_idx=0)\n", " (position_embeddings): Embedding(512, 768)\n", " (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " )\n", " (transformer): Transformer(\n", " (layer): ModuleList(\n", " (0-5): 6 x TransformerBlock(\n", " (attention): DistilBertSdpaAttention(\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " (q_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (k_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (v_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (out_lin): Linear(in_features=768, out_features=768, bias=True)\n", " )\n", " (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " (ffn): FFN(\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " (lin1): Linear(in_features=768, out_features=3072, bias=True)\n", " (lin2): Linear(in_features=3072, out_features=768, bias=True)\n", " (activation): GELUActivation()\n", " )\n", " (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " )\n", " )\n", " )\n", " )\n", " (pre_classifier): Linear(in_features=768, out_features=768, bias=True)\n", " (classifier): Linear(in_features=768, out_features=2, bias=True)\n", " (dropout): Dropout(p=0.2, inplace=False)\n", ")\n" ] } ], "source": [ "print(analyzer.model)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that it is using DistilBERT as the model for sentiment analysis. \n", "\n", "Let's apply this sentiment analysis pipeline to the sentences in our dataset and see how well it performs compared to our Random Forest classifier that used sentence embeddings as features." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:16:07.457388Z", "iopub.status.busy": "2026-02-23T20:16:07.457207Z", "iopub.status.idle": "2026-02-23T20:16:18.222446Z", "shell.execute_reply": "2026-02-23T20:16:18.221559Z" } }, "outputs": [], "source": [ "results = analyzer(df[\"text\"].tolist(), batch_size=32)\n", "df[\"hf_sentiment\"] = [int(r['label'] == \"POSITIVE\") for r in results]\n", "df[\"hf_score\"] = [r['score'] if r['label'] == \"POSITIVE\" else 1 - r['score'] for r in results]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can evaluate the performance of the Hugging Face sentiment analysis model using the same metrics as before" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:16:18.225095Z", "iopub.status.busy": "2026-02-23T20:16:18.224891Z", "iopub.status.idle": "2026-02-23T20:16:18.240328Z", "shell.execute_reply": "2026-02-23T20:16:18.239529Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 0.9298245614035088\n", "Precision: 0.863849765258216\n", "Recall: 0.9633507853403142\n", "ROC AUC: 0.9794478228350298\n" ] } ], "source": [ "test_idx = y_test.index\n", "print(f\"Accuracy: {accuracy_score(df.loc[test_idx, 'sentiment'], df.loc[test_idx, 'hf_sentiment'])}\")\n", "print(f\"Precision: {precision_score(df.loc[test_idx, 'sentiment'], df.loc[test_idx, 'hf_sentiment'])}\")\n", "print(f\"Recall: {recall_score(df.loc[test_idx, 'sentiment'], df.loc[test_idx, 'hf_sentiment'])}\")\n", "print(f\"ROC AUC: {roc_auc_score(df.loc[test_idx, 'sentiment'], df.loc[test_idx, 'hf_score'])}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The performance of the Hugging Face model is not quite as good as our Random Forest classifier that used sentence embeddings. This is likely due to the fact that the pre-trained model was not fine-tuned on our specific dataset of ECB speeches but on movie reviews. \n", "\n", "\n", ":::{.callout-note}\n", "\n", "#### Fine-Tuning Pre-trained Models\n", "\n", "The transformers library also allows us to fine-tune pre-trained models on our specific dataset, which can significantly improve the performance of the model for our specific task. However, fine-tuning a large language model can be computationally expensive and may require access to a GPU. Therefore, we will not cover the fine-tuning process in this lecture, but it is an important topic to explore if you want to achieve the best possible performance on your specific task.\n", "\n", ":::\n", "\n", "\n", "### Other NLP Tasks with Pre-trained Models \n", "\n", "The transformers library can also be used for many other NLP tasks. For example, we can use a pre-trained model for text generation" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:16:18.242451Z", "iopub.status.busy": "2026-02-23T20:16:18.242264Z", "iopub.status.idle": "2026-02-23T20:16:31.944200Z", "shell.execute_reply": "2026-02-23T20:16:31.943248Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Device set to use mps:0\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n" ] }, { "data": { "text/plain": [ "[{'generated_text': \"The ECB's monetary policy is very important. It is important to understand that the ECB is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is not a central bank. It is\"}]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "generator = pipeline(model=\"openai-community/gpt2\")\n", "generator(\"The ECB's monetary policy is very\", do_sample=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example, we are using a pre-trained GPT-2 model to generate text based on the input prompt \"The ECB's monetary policy is very\".\n", "\n", "We can also use a pre-trained model for zero-shot classification, which allows us to classify text into categories without having to fine-tune the model on a specific dataset" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:16:31.946260Z", "iopub.status.busy": "2026-02-23T20:16:31.946073Z", "iopub.status.idle": "2026-02-23T20:16:35.647367Z", "shell.execute_reply": "2026-02-23T20:16:35.646459Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Device set to use mps:0\n" ] } ], "source": [ "classifier = pipeline(\"zero-shot-classification\", model=\"facebook/bart-large-mnli\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using this zero-shot classification pipeline, we can classify sentences into categories based on the content of the sentences, even if the model has not been specifically trained on those categories. For example, we can classify sentences from ECB speeches into categories such as \"monetary policy\", \"fiscal policy\", or \"other\" based on their content." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2026-02-23T20:16:35.649538Z", "iopub.status.busy": "2026-02-23T20:16:35.649373Z", "iopub.status.idle": "2026-02-23T20:16:36.617755Z", "shell.execute_reply": "2026-02-23T20:16:36.617068Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sequence: The European Central Bank is committed to price stability.\n", "Predicted label: monetary policy\n", "Confidence score: 0.9034\n", "------\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: Governments need to run a balanced budget.\n", "Predicted label: fiscal policy\n", "Confidence score: 0.8744\n", "------\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: Leslie Nielsen was a great actor.\n", "Predicted label: other\n", "Confidence score: 0.8404\n", "------\n" ] } ], "source": [ "sequences_to_classify = [\n", " \"The European Central Bank is committed to price stability.\",\n", " \"Governments need to run a balanced budget.\",\n", " \"Leslie Nielsen was a great actor.\"]\n", "candidate_labels = [\"monetary policy\", \"fiscal policy\", \"other\"]\n", "\n", "for sequence in sequences_to_classify:\n", " result = classifier(sequence, candidate_labels)\n", "\n", " print(f\"Sequence: {sequence}\")\n", " print(f\"Predicted label: {result['labels'][0]}\")\n", " print(f\"Confidence score: {result['scores'][0]:.4f}\")\n", " print(\"------\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3", "path": "/usr/local/share/jupyter/kernels/python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.14" } }, "nbformat": 4, "nbformat_minor": 4 }