search in huge text corpora

learn how words are used

compare and contrast words visually

upload and search your own corpora

build specialised corpora instantly from the Web

extract wordlists, keywords, terms and thesauri

explore distributional thesaurus with word clouds

The Sketch Engine

The Sketch Engine is for anyone wanting to research how words behave. It is a Corpus Query System incorporating word sketches, one-page, automatic, corpus-derived summaries of a word's grammatical and collocational behaviour.

Learn more about Sketch Engine features, language resources and tools.

Users and Uses

Lexical Computing

The Sketch Engine is a product of Lexical Computing – a small research company, founded by Adam Kilgarriff in 2003. It works at the intersection of corpus and computational linguistics, and is committed to an empiricist approach to the study of language in which corpora play a central role: for a very wide range of linguistic questions, if a suitable corpus is available, it will help our understanding. Its strap line is ‘corpora for all’.

To be able to provide corpus services, LCL needs corpora. As at May 2013 we have large corpora for 52 languages. (‘Large’ meaning over 1 million words; in most cases corpora are over 100 million words.) For the most part these are collected from the web – LCL is a lead player in the ‘web as corpus’ initiative – and have involved collaborations with language experts for the languages in question, for example:

People, directors

Adam Kilgarriff is company founder and owner. He is a corpus and computational linguist. Following a PhD at Sussex University on word meaning, he worked at Longman Dictionaries and Brighton University prior to setting up LCL. He has published widely in the areas of word senses, corpora and lexicography and given keynote lectures at a number of conferences. He organised the first SENSEVAL competition for the evaluation of word sense disambiguation systems. He has chaired the Association for Computational Linguistics Special Interest Groups on the Lexicon (2000–2004) and Web as Corpus (2006–2009; founding chair), he was also a European Association for Lexicography board member 2002–2006. He is a Visiting Research Fellow at the University of Leeds.

Pavel Rychlý is a computer scientist and computational linguist. His PhD thesis was on optimal designs for corpus query systems, and he has, since then, been developing, first, the Manatee system, and since 2003, the Sketch Engine. He is a lecturer and senior researcher at the NLP Centre, Masaryk University in Brno, Czech Republic.