Lecturer(s)
|
-
Mrkvička Miroslav, doc. Ing. Ph.D.
|
Course content
|
1. Motivation and program of the course. Introduction. 2. Information extraction from the Web, web crawlers. 3. Tokenization, stemming, Porter stemmer, lemmatization, POS tagging, parsing. Dictionaries, edit distance. 4. Information retrieval, Boolean model, indexing. 5. Query and document similarity, vector space model, top hits selection. 6. The Web as a graph, link analysis, PageRank, HITS. 7. Evaluation of an IR system, standard evaluation corpora, evaluation of relevance. 8. XML retrieval, vector space model for XML retrieval. 9. Question answering. 10. Multimedia information retrival. 11. Text classification, feature selection, classification evaluation, classification in the vector space model. Detection of plagiarism, spams. 12. Text clustering, determining the number of clusters. News clustering systems. 13. Introduction to text analysis - Information extraction, text summarization, opinion mining.
|
Learning activities and teaching methods
|
Lecture supplemented with a discussion, Project-based instruction, Discussion, Multimedia supported teaching, Students' portfolio, Skills demonstration, Task-based study method, Individual study, Textual studies, Practicum
- Individual project (40)
- 40 hours per semester
- Contact hours
- 65 hours per semester
- Preparation for an examination (30-60)
- 55 hours per semester
|
prerequisite |
---|
Knowledge |
---|
get a better understanding of the possibilities of application software with the aim of better processing the increasing amount of data |
explain the principles of relational databases, data integrity and basic SQL statements; describe data modelling approaches |
describe the principles of procedural and object-oriented programming languages including the basic control structures and data representation forms, explain the fundamental data structures and algorithms to work with them |
Skills |
---|
sort, process and present the acquired information in both written and oral forms in Czech and English; produce documentation of the implemented oeuvre or its components |
získávat a zpracovávat informace ze zdrojů v anglickém jazyce |
deign a small- to middle-size database or information system; design and implement a simple stand-alone web application |
master the principles of the creation of well documented and robust programming code; make use of the theoretical as well as practical knowledge of algorithms, data structures and specific developer tools |
Competences |
---|
N/A |
learning outcomes |
---|
Knowledge |
---|
explain and illustrate the methods and models for the representation and processing of large-scale unstructured data |
describe the principles of natural language processing and of textual data search |
Skills |
---|
make efficient use of the methods and technologies for the search in large-scale unstructured data |
implement various web search methods and basic natural language processing techniques |
Competences |
---|
N/A |
N/A |
make use of one's professional knowledge, skills, and general abilities in English and, to some extent, also in one other foreign language |
teaching methods |
---|
Knowledge |
---|
Lecture supplemented with a discussion |
Self-study of literature |
Practicum |
Individual study |
Task-based study method |
Multimedia supported teaching |
Skills |
---|
Skills demonstration |
Competences |
---|
Lecture supplemented with a discussion |
assessment methods |
---|
Knowledge |
---|
Individual presentation at a seminar |
Continuous assessment |
Test |
Combined exam |
Skills |
---|
Project |
Skills demonstration during practicum |
Combined exam |
Competences |
---|
Combined exam |
Recommended literature
|
-
Baeza-Yates, R.; Ribeiro-Neto, Berthier. Modern information retrieval. Harlow : Addison-Wesley, 1999. ISBN 0-201-39829-X.
-
Büttcher, Stefan.; Clarke, Charles L. A.; Cormack, Gordon V. Information Retrieval: Implementing and Evaluating Search Engines. Cambridge: The MIT Press, 2016. ISBN 978-0-262-52887-0.
-
Chakrabarti, Soumen. Mining the web : discovering knowledge from hypertext data. San Francisco : Morgan Kaufmann Publishers, 2003. ISBN 1-55860-754-7.
-
Jurafsky, Daniel; Martin, James H. Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition. 2nd ed. Upper Saddle River : Pearson/Prentice Hall, 2009. ISBN 978-0-13-504196-3.
-
Manning, Christopher D.; Raghavan, Prabhakar; Schütze, Hinrich. Introduction to information retrieval. 1st pub. New York : Cambridge University Press, 2008. ISBN 978-0-521-86571-5.
|