Formation : Summer School on Computational Approaches to Historical Linguistics, 21 juillet-1er aout, Plovdiv, Bulgarie

Formation : Summer School on Computational Approaches to Historical Linguistics, 21 juillet-1er aout, Plovdiv, Bulgarie

Summer School on Computational Approaches to Historical Linguistics

Confirmed speakers so far: Giuseppe Longobardi (University of York), Paola Crisma (University of Trieste), Dimitar Kazakov (University of York), Cristina Guardiano (Modena e Reggio Emilia)

Host Institution: University of Plovdiv Paisii Hilendarski, Plovdiv, Bulgaria
Coordinating Institution: University of York, UK
Website: https://ca2hl.org/

Dates: 21-Jul-2025 – 01-Aug-2025
Location: Plovdiv, Bulgaria

Minimum Education Level: Graduate students and academics

Special Qualifications: An interest in the topic and a background in related disciplines, such as Linguistics, Computational Linguistics, Computer Science or Statistics, Scientific approaches to cognitive history.

Focus: * Acquire the theoretical knowledge and practical skills described in the syllabus* (see below);
* Collect new data (parameter settings and evidence) on languages represented among the participants;
* Produce an archival record of the data and findings co-authored by all contributors.

*Syllabus:

Week 1: Introduction to Historical Linguistics. The Comparative Method. The Parametric Comparison Method (PCM). Case Study: The Indo-European Language Family. Language trees and language contact. Case Study: The Balkan Sprachbund.

Week 2: Parameter Setting for PCM. Case study: Analysing Syntactic Parameters in Participants’ Languages. Data Visualisation: Gradient Maps of Principal Components and Their Historical Implications. Exploring Migrations and Cultural Contact through Linguistic Evidence. Clustering Techniques for Phylogenetic Tree Generation. Advanced Tools: Using Large Language Models (LLMs) for Parameter Evidence Collection.

In more detail:

Week 1 covers the basics of the classical (lexical-etymological) comparative method for phylogenetic reconstruction and a full introduction to the innovative syntactic-parametric method. The theory of parameters from the perspective of language typology and of language acquisition will be introduced. Then it will be shown how to apply current syntactic wisdom and models to define the grammar of a new language (set its parameters) starting from either speakers’ knowledge or portions of text. The place of historical linguistics among cognitive neurosciences will be discussed.

Week 2 focuses on applying the knowledge acquired in Week 1 to collecting data on syntactic parameters and its use in historical linguistics. Participants will systematically analyze the syntactic parameters of their chosen language by responding to a predefined set of questions for each parameter and documenting example sentences that serve as evidence. The collected data will be integrated with existing datasets to generate gradient maps of principal components in the tradition of Cavalli-Sforza’s work in genetic classification of populations, facilitating discussions on their potential relevance as evidence for historical migrations and cultural contact. Additionally, participants will acquire hands-on experience with clustering techniques derived from bioinformatics to construct phylogenetic trees based on the compiled data. The week will conclude with an exercise in employing large language models (LLMs), such as ChatGPT, to assist in the identification and documentation of syntactic parameters of a given language.

Description:

We’re excited to invite graduate students and researchers to CA2HL 2025, an intensive two-week summer school focused on cutting-edge computational methods in historical linguistics.

Participants will engage with the Parametric Comparison Method (PCM), a novel approach to modelling language relationships through syntactic parameters, while contributing to the collection and analysis of new linguistic data. The collected data will be integrated with existing datasets to generate gradient maps of principal components in the tradition of Cavalli-Sforza’s work in genetic classification of populations, facilitating discussions on historical migrations and cultural contact. Participants will also acquire hands-on experience with clustering techniques derived from bioinformatics to construct phylogenetic trees of the studied languages. The material will conclude with an exercise in employing large language models (LLMs), such as ChatGPT, to assist in the identification and documentation of syntactic parameters of a given language.

The programme combines expert-led lectures, hands-on collaboration, and individual study, providing a unique opportunity to deepen your research and build lasting academic connections.

Background:
Historical linguistics studies how languages change or remain similar over time. It investigates the development, evolution, and relationships among languages, focusing on their phonology (sounds), morphology (word structure), syntax (sentence structure), semantics (meanings), and lexicon (vocabulary). It also examines how and why languages diverge, interact, and sometimes converge through processes such as language contact and borrowing. Historical linguistics aims to classify languages into families, understand the mechanisms and causes of language change, reconstruct proto-languages, and trace the cultural and social history of linguistic communities.

During the school’s two weeks particular focus will be put on the novel Parametric Comparison Method (PCM). The PCM is a theoretical approach in historical linguistics that focuses on the comparison of grammatical parameters rather than traditional vocabulary-based or phonological comparisons. This method derives from the idea that languages can be compared – and a significant historical signal retrieved – through a finite set of syntactic parameters that determine structural variation across languages. PCM is rooted in the framework of Universal Grammar, assuming all human languages share a common set of principles, which allow for variation through “parameters”, discrete, normally binary, settings that determine specific syntactic properties, such as whether null subjects are allowed (e.g. Italian) or not (e.g. English). Instead of comparing word lists (as in the etymological method) and phonological changes (as in phonological reconstruction), PCM identifies and compares syntactic parameters across languages and families (even remote ones) where shared parameter settings may suggest genealogical relationships or contact-induced convergence.

The method is susceptible to borrowing effects, but, unlike vocabulary and sounds, which can be borrowed across unrelated languages, syntactic parameters are less likely to be directly borrowed. It is applicable to languages with scarce lexical or phonological data but robust syntactic descriptions, and aims to offer insights into deeper relationships among languages by focusing on deep structural features rather than surface-level elements. The PCM has been used to explore both relationships within language families and between distant languages, such as hypothesizing connections between language families in macro-comparative studies. For example, in comparing Japanese and Korean, the PCM might focus on shared syntactic features, such as their head-final structures or similar presence/absence of grammatical features, rather than relying on vocabulary. This could help assess whether these similarities reflect genetic relatedness, contact, or just chance typological affinity.

As for all most modern and sophisticated approaches to historical linguistics, the PCM requires substantial quantitative and computational treatment of the data and of the hypotheses of language relatedness. This is an especially stimulating challenge for both linguists and computer scientists interested in an innovative syntactic method of comparison and reconstruction: the particular and largely unexplored formal properties of historical syntax seem to require specific computational algorithms that cannot be borrowed unchanged from those used for phylogenetic inference in biology and lexical historical linguistics.

References:
Campbell, Lyle. Historical Linguistics: An Introduction, EUP.

Cavalli-Sforza, Luigi Luca (2000). Genes, Peoples and Languages. Penguin Group.

Ceolin Andrea, Cristina Guardiano, Giuseppe Longobardi, Monica A Irimia, Luca Bortolussi and Andrea Sgarro (2021). At the boundaries of syntactic prehistory, Philosophical Transactions of the Royal Society B 376 (1824), 20200197.

Crisma, Paola, and Giuseppe Longobardi (eds, 2009). Historical Syntax and Linguistic Theory:. Introduction. OUP.

Guardiano Cristina and Giuseppe Longobardi, (2017). Parameter theory and parametric comparison, in Ian Roberts (ed.) Oxford Handbook of UG, OUP.

Guardiano Cristina, Giuseppe Longobardi, Guido Cordoni and Paola Crisma (2020). Formal syntax as a phylogenetic method, R.D. Janda, B. Joseph, B. Vance (eds) Blackwell’s Handbook of Historical Linguistics Volume II, 145-182

Lightfoot, David. How to Set Parameters: Arguments from Language Change.

Longobardi, Giuseppe, and Cristina Guardiano (2009). Evidence for syntax as a signal of historical relatedness, Lingua, 119.

Roberts, Ian, (2007/2022) Diachronic syntax, CUP.

Tuition: €99 (2-week fee)
Tuition Explanation: This amount covers the tuition over the full 2-week period and does not include accommodation or meals (see web site for more information). If you want to attend only the first or second week of the summer school, please indicate this in the registration form and contact the organisers. The same applies if the tuition fee is a potential obstacle for your attendance.

Linguistic Field(s): Genetic Classification
Historical Linguistics

Registration Open until 15-Jul-2025

Contact Person: Dr Dimitar Kazakov
Email: dimitar.kazakov@york.ac.uk

Apply by Email: dimitar.kazakov@york.ac.uk
Apply on the web: https://forms.gle/QU3cgXeX9RaPuCSX8

Registration Instructions:
Please complete the registration of interest form and await a reply from the organisers.