Blog post #2: Challenges in the Analysis of Political Texts for the Detection of Executive Aggrandisement

Blog post #2: Challenges in the Analysis of Political Texts for the Detection of Executive Aggrandisement

As TWIN4DEM moves from conceptual groundwork to data-driven implementation, Work Package 3 plays a pivotal role in translating political dynamics into measurable signals. WP3 focuses on detecting executive aggrandisement and the reactions it triggers by systematically analysing political texts across countries and languages. During the first year of the project, partners have laid the foundations for a robust computational pipeline that will feed directly into the Digital Twin.

Collecting Political Texts for the Detection of Executive Aggrandisement

The first step towards the goal of identifying executive aggrandisement and analysing politicians’ stance and reactions is the collection of relevant data. In TWIN4DEM, multiple textual sources are being integrated, including legislative texts, parliamentary debates and politicians’ social media posts. Although several initiatives aim at collecting and making available the above data, both through official channels such as Data.gouv.fr and based on researchers’ and activists’ initiatives such as the ParlaClarin project, the current infrastructure does not provide this comprehensive data ecosystem. Therefore, all partners involved in Work Package 3 are dealing with the challenges of finding, retrieving, processing and storing the data needed for the task. Among the most pressing issues there is the variability of data formats not only across countries but also across time. For example, parliamentary data may be available in a specific format only for some legislatures, while in TWIN4DEM we plan to cover the time period between 2010 and 2024, and we need to have a balanced set of data for the whole time span. Likewise, all political parties and positions should be equally represented in the data. 

Paradoxically, two additional challenges are data scarcity and data overflow. For example, while Twitter/X has been a popular communication channel used by politicians across Europe, this is not the case in Hungary, for which we will have to implement a different strategy to retrieve MPs’ public statements. On the other hand, sometimes the available data are just too much to be meaningfully processed to identify examples of aggrandisement. For instance, the number of regulative or administrative acts issued by ministers, prefects or mayors for the French use case is 245,500 documents, and this accounts only for a portion of all legislative data we will process in TWIN4DEM for France. In this case, we therefore need to narrow down the scope of our analysis by identifying a subcorpus with documents that in terms of topics, structure and linguistic expressions are likely to signal aggrandisement. This selection is driven by the domain experts in our consortium and requires a different strategy for each use case. By combining political scientists’ knowledge and a plurality of NLP techniques, we aim at describing and unveiling the “language of aggrandizement” in statutory documents. 

What comes next:

Despite current efforts, the output concerning WP3 data will provide a unique resource to the scientific community, data scientists, journalists, political organisations, activists, and more generally to all citizens interested in monitoring political activity. Indeed, while there have been initiatives to interlink metadata concerning politicians’ profiles, their activity in Parliament and biographical information, in TWIN4DEM we will make available a much richer database where politicians’ statements on social media will be aligned with their speeches in parliament, voting behaviour and much more. This will open new research directions to investigate the gap between politicians’ online behaviour and their statements in Parliament, their attitude with respect to their party orientation, their distance from other MPs. Novel NLP approaches will be enabled by this textual resource to detect executive aggrandisement, offer new perspectives on political opinions and make cross-country comparisons.