Natural Language Processing

My main area of focus is natural language processing (NLP). I studied a Masters in 2008 at Cambridge University in Computer Speech, Text and Internet Technology and since then I have been working exclusively in machine learning and mostly in NLP.

In recent years I have moved into freelance data science consultancy, focusing on NLP. I have built NLP pipelines from scratch, and worked on natural language dialogue systems, document classifiers and text based recommender systems. For these tasks I have used both traditional machine learning techniques as well as the state of the art such as neural networks.

Natural Language Processing technologies that I use

Clustering of documents in the topic Natural Language Processing
Topic detection is an NLP technique that allows you to discover common themes in a set of unstructured documents.

I have worked on a variety of NLP models, including

  • Bag of words, tf*idf, cosine similarity
  • NLP pipelines, lemmatisation, parsers, chunkers
  • Deep neural networks
  • Clustering: Latent Dirichlet Allocation
    • This is useful for extracting topics from a set of unstructured documents, for example legal documents, survey responses, factory error reports, etc.
  • Search engines and search term recommenders

NLP software

I work with the following programs

  • TensorFlow
  • Keras
  • Python NLTK
  • R

Examples of past Natural Language Processing projects

NLP projects I have worked on for major household names include

  • a spoken dialogue system to control a smart home
  • an unsupervised text analysis program to analyse text descriptions of manufacturing defects
  • a model to classify jobseekers’ CVs into industries and salary bands.
  • analysis of survey responses