MHLT H6019 - Programming for NLP

Short Title:Programming for NLP
Full Title:Programming for Natural Language Processing
Module Code:MHLT H6019
ECTS credits: 10
NFQ Level:9
Module Delivered in 1 programme(s)
Module Contributor:Irene Murtagh
Module Description:To provide students with in-depth knowledge and skills of the role of programming skills and strategies for NLP contexts, To give students the skills necessary a wide range of NLP related algorithmic techniques for software development, To provide students with the necessary theoretical and practical applications framework for programming natural language applications, To provide students with in-depth knowledge and skills regarding the complex challenges of processing human languages in software, To instil an in-depth appreciation of the importance of quality in software development.
Learning Outcomes:
On successful completion of this module the learner will be able to
  1. Develop programmes that can manipulate and analyse human language data.
  2. Apply core knowledge concepts from NLP and linguistics to analyse and describe human language.
  3. Apply data structures and algorithms in NLP.
  4. Store complex language data in standard formats.
  5. Evaluate the performance and effectiveness of NLP techniques.

Module Content & Assessment

Indicative Content
Language Processing and Python (10%)
Computing with human language as texts, words, tokens and symbols. Texts as lists of words. Control structures and decision making. Automatic natural language understanding
Accessing digital lexical resources and text corpora (10%)
Accessing text corpora. Conditional frequency distributions. Code reusability as a strategy Lexical resources. WordNet and FrameNet
Processing raw text (10%)
Accessing text from the Internet. Strings and text processing. Unicode representation. Applying regular expressions for detecting word patterns in syntax. Normalising text. Tokenizing text. Segmentation. Processing lists and strings
Writing software code for NLP (10%)
Algorithm design. Programming style for good quality code. Structured programming. Functions and methods. Programme development strategies
Categorising and tagging words (10%)
Using a Tagger. Tagged corpora. Mapping words to properties using Python dictionaries. Automatic tagging. N-Gram tagging. Determining the lexical category of a word
Extracting information from digital text (10%)
Information extraction. Chunking. Developing and evaluating chunkers. Recursion in linguistic structure. Named entity recognition. Relation extraction
Analysing the sentence in code (10%)
Modelling of linguistic patterns. The role of linguistic syntax. Context-free and context-sensitive grammar. Parsing with context-free grammar. Argument structure, valence and dependency grammar. Grammar development
Building Feature-Based Grammars (10%)
Grammatical features and attribute value matrices. Processing feature structures. Feature-based grammars
Analysing the meaning of sentences (10%)
Natural language understanding challenges. Propositional logic. First-order logic. The semantics of English sentences. Discourse semantics. Description logics
Current issues in managing linguistic data in software (10%)
The Life Cycle of a Corpus. Corpus Structure. Acquiring Data. Working with XML. Working with Toolbox Data. Describing Language Resources Using Open Language Archives Community (OLAC) metadata
Indicative Assessment Breakdown%
Course Work Assessment %100.00%
Course Work Assessment %
Assessment Type Assessment Description Outcome addressed % of total Assessment Date
Practical/Skills Evaluation Practical work based on lecture material 1,5 20.00 Every Week
Project The student will typically be expected to construct effective structured code that manipulates and analyses human language data, along with the appropriate selection of data structure and algorithm choices. 1,2,3 40.00 Week 6
Project The student will typically undertake work in creating and manipulating complex language data in a variety of formats within code and be expected to generate metrics that evaluate the performance and effectiveness of the NLP techniques and choices made. 4 40.00 Week 9
No Final Exam Assessment %
Indicative Reassessment Requirement
Coursework Only
This module is reassessed solely on the basis of re-submitted coursework. There is no repeat written examination.

ITB reserves the right to alter the nature and timings of assessment


Indicative Module Workload & Resources

Indicative Workload: Full Time
Frequency Indicative Average Weekly Learner Workload
Every Week 24.00
Every Week 24.00
Every Week 152.00
Indicative Workload: Part Time
Frequency Indicative Average Weekly Learner Workload
Every Week 24.00
Every Week 24.00
Every Week 152.00
Recommended Book Resources
  • Steven Bird, Ewan Klein, and Edward Loper 2009, Natural language processing with Python, O'Reilly Sebastopol, Calif. [ISBN: 0596516495]
  • XML Processing with Perl, Python, and PHP: Also Covers Tel, Rebol, Ruby, and AppleScript, John Wiley & Sons Hoboken [ISBN: 0782140211]
  • Daniel Jurafsky, James H. Martin, Speech and language processing, Upper Saddle River, NJ Pearson 2009 [ISBN: 0135041961]
Supplementary Book Resources
  • Nolan, Brian and Carlos Periñán [Studies in language Companion Series 150]., Language processing and grammars: The role of functionally oriented computational models., John Benjamins Publishing Company. Amsterdam and New York
  • Xuedong Huang, Alex Acero, Hsiao-Wuen Hon 2001, Spoken language processing, Prentice Hall PTR Upper Saddle River, N.J. [ISBN: 0130226165]
  • David Harrel 2012, Algorithmics, Boston, Mass. ; Addison-Wesley, 2004. [ISBN: 0321117840]
  • Grant S. Ingersoll, Thomas S. Morton, Andrew L. Farris, Taming Text, Manning Publications [ISBN: 193398838X]
  • Jeff McNeil, Python 2.6 Text Processing Beginners Guide, Packt Publishing [ISBN: 1849512124]
  • David Mertz 2003, Text processing in Python, Addison-Wesley Boston [ISBN: 0321112547]
  • Nolan, Brian and Elke Diedrichsen. Studies in language Companion Series 145]. 2013, ‘Linking Constructions into functional linguistics – The role of constructions in RRG grammars’, John Benjamins Publishing Company. Amsterdam and New York
This module does not have any article/paper resources
Other Resources

Module Delivered in

Programme Code Programme Semester Delivery
BN_KMHLT_R Master of Science in Computing in Multimodal Human Language Technology 1 Mandatory