The computational linguistics group at Uppsala University has a strongly empirical orientation emphasizing multilingual systems, especially machine translation, and systems for grammatical analysis of text, in particular dependency-based parsing. Another focus area is digital humanities with projects on hand-written text recognition, historical text processing, and historical ciphers. The group has been involved in the development of a number of tools and resources, such as MaltParser (data-driven dependency parser), UPlug (toolbox for parallel corpus alignment), Swedish Treebank (syntactically annotated corpus), and OPUS (multilingual parallel corpus). Below we describe some of our current projects of relevance to CLARIN.
Universal Dependencies is an open-source community effort to create cross-linguistically consistent treebank annotation for many languages. This will in turn facilitate the deployment of parsing technology to support research in the Humanities and Social Sciences for multiple languages. So far, treebanks have been released for 18 languages (including Swedish).
The project group From Quill to Bytes works with to analyzing historical handwritten documents with the help of methods from image analysis and language technology. In particular the aims are to develop methods for finding linguistic items directly in manuscript images, for automatic transcription of manuscripts, and for identifying the scribe, style, or age of manuscripts.
In the Gender & Work project, historians do research on what men and women did for a living in the Early Modern Swedish society (1550-1800). This information is currently extracted by manually going through large volumes of text in search for text passages describing work. We develop techniques for assisting in this process by automatically extracting phrases with a high probability of describing working activities, based on spelling normalization and linguistic analysis.
Thousands of enciphered historical manuscripts are buried in libraries and archives. We develop computer-aided tools for automatic decoding of historical ciphers. The project involves systematic detection of various cipher types, the development of algorithms for decryption, and the creation of language models and pattern dictionaries for early variants of European languages.
Visit to CiltLab in Linköping
May saw another visit by the national coordination team to a Swe-Clarin centre, as proposed at the kick-off meeting. The purpose of these visits is to offer the centres and the national coordination a chance to get to know each other and to discuss how Swe-Clarin can work as efficiently as possible. Friday the 29th, Lars, Caspar, and Stefan travelled to Linköping for a pleasant meeting with centre director Magnus Merkel and expert Lars Ahrenberg.
The Swe-Clarin centre at Linköping university is located at CiltLab (Cognition, Interaction, and Language Technology) under the section of Human-Centred Systems, Department of Computer and Information Science. We were given a presentation of the research and teaching that are carried out within language technology by the nine language technologists active at CiltLab, and a run-through of the nine resources that the centre will make available to CLARIN. The broader discussions about Swe-Clarin concerned, inter alia, the differences between CLARIN K centres and L centres, and the possibility to develop a base kit with tools for e.g. tokenisation, lemmatisation, tagging. The latter issue will be raised again at the virtual meeting on June 12.
Visits to the remaining centres of Swe-Clarin will hopefully take place during the latter half of September.
On the European level, it is worth noting that all CLARIN services with federated login are also available to the Swedish academic community (i.e. they can log in using their university accounts). First tests indicate that the connection via eduGAIN works well. Another piece of news is that Leif-Jöran Olsson from Språkbanken has taken over the responsibility for developing CLARIN’s Content Search.
This is the last newsletter before the summer vacation. We will return in August and wish you all a wonderful summer!
Calendar
9-10 June: Nordic Clarin Network Workshop.
12 June: Virtual meeting for the Swe-Clarin partners, 10am to noon.
5-6 October: Nordic Clarin Network Workshop in connection to the Language Bank’s autumn workshop on historical resources.
11 November: SND’s autumn workshop under the theme “New Conditions for Research”. 19-20 November: Swe-Clarin general meeting in Stockholm, Swe-Clarin Workshop enclosed Friday afternoon.
Everybody is welcome.
Partners
Swe-Clarin has nine partners from Lund, Gothenburg, Linköping, Stockholm and Uppsala, in universities and public authorities.
A list and description of all partners may be found here: http://sweclarin.se/swe/centrum
News
We will not go on spamming you. Should you want more info on Swe-Clarin, please sign up for the news list here: http://lists.sweclarin.se/mailman/listinfo/news_lists.sweclarin.se