Romanian Balanced Corpus (ROMBAC)



ROMBAC is a Romanian corpus containing equal shares of texts from 5 different genres: journalism, legalese, fiction, medicine and biographical data for Romanian literary personalities. For each genre, texts have been selected containing around 7,000,000 words, so that the entire corpus counts around 41,000,000 words, including punctuation.

You don’t have the permission to edit this resource.