Mastering NLP with spaCy – Part 2: Unlocking the structure of language through Part-of-Speech tagging, dependency parsing, and Named Entity Recognition to understand how words function, relate to one another, and represent real-world entities in text.
In natural language, words carry complex meanings that depend on context, relationships with other words, and real-world references, often making their interpretation ambiguous even for humans. To build systems with true language understanding, we rely on key NLP tasks that uncover different layers of meaning. Part-of-speech (POS) tagging classifies words by their grammatical role—such as noun, verb, adjective, or adverb—helping identify how each word functions in a sentence. spaCy uses universal tags like "NOUN" or "VERB" and more detailed ones like "VBD" (verb, past tense), which can be explained using spacy.explain(). The tag assigned to a word depends on its context, surrounding words, and grammatical structure, as determined by statistical models. Beyond tagging, dependency parsing reveals how words are syntactically related. It constructs a tree-like structure where each word (the child) connects to a parent (the head) through a labeled dependency relation. For example, in "red car", "car" is the root and "red" is an adjective modifying it, labeled as "amod" (adjectival modifier). This shows how meaning is built compositionally. Each word can have multiple children but only one parent, and relations like "nsubj" (nominal subject), "dobj" (direct object), and "advmod" (adverbial modifier) help map sentence structure. spaCy’s displacy module visualizes these trees, making relationships clear. Finally, Named Entity Recognition (NER) identifies real-world entities in text—such as people, organizations, locations, dates, or events. In the sentence "Rome is the best city in Italy based on my Google search", spaCy recognizes "Rome", "Italy", and "Google" as named entities, labeling them as GPE (geopolitical entity), GPE, and ORG (organization), respectively. The doc.ents attribute returns these entities, and spacy.explain() provides detailed descriptions of entity types. Visualizing NER results with displacy highlights the entities and their categories, aiding interpretation. Together, POS tagging, dependency parsing, and NER provide a structured way to analyze text, enabling machines to understand word roles, syntactic relationships, and real-world references. These techniques form the foundation for applications like question answering, information extraction, and sentiment analysis. Tools like spaCy simplify the implementation, offering intuitive access to these powerful linguistic insights.