Jethro's Braindump

SSNLP Conference Notes

Natural Language Processing

AliNLP: the Text Processing Engine that Powers Alibaba’s Business Applications

  • Linlin Li

AliNLP is a large-scale NLP technology platform for the entire Alibaba ecosystem.

  • Utilizes behaviour data instead of demanding human annotations of NLP algorithms
  • Utilizing multiple correlated tasks for improving effectiveness of individual tasks of the complex Alibaba eco-system

NLP Data:

  • language dictionary
  • token & POS corpus
  • Entity Corpus
  • Treebank
  • Sentiment Corpus
  • News Category Corpus
  • Word Relation Graph
  • Knowledge Graph

NLP Foundations:

  • Lexical Analysis
  • Syntactical Analysis
  • Semantic Analysis
  • text Analysis
  • Deep Learning
    • Word2vec
    • Graph2vec

Challenges in Chinese word segmentation application

  • Limited amount of human annotated training data

  • Frequent appearance of out-of-vocabulary words

  • Taobao has billions of user search logs available

    • auto semi-annotated data

Multi-task learning : Combine human annotated data and automatically

Machine Reading Comprehension

  • Aim at answering a question about a given domain/paragraph
  • Using IR based selection method to choose the relevant document/paragraph
  • No Pre-knowledge or FAQ needed

Encode Layer (BiLSTM) -> Attention Layer -> Match Layer (Bilinear Match) -> Output Layer (Pointer Network)

Used in AliMe, Timi, Lazada Chatbot

Ido Dagan:

News Tweet Illustration: What is being said here?

Structured Knowledge Graphs (freebase knowledge graph)

“Natural Representations”: Open IE - isolates propositions as predicate–argument tuple

QA-SRL: Question answering - semantic role labeling

Each QA pair represents one predicate-argument relation QA roles: Gunman takes own life after killing thrtee in Wisconsin spa shooting. Form QASemDep Graph.

co-referring nodes/edges collapsed.

Noah Smith: Synchretizing Structured and Learned Representations

  • Broad-coverage semantic parsing: baselines
  • Building SCAFFOLD from alternative tasks
  • Putting a SPIGOT on the pipeline

NL Semantics : Two Operationalizations

  1. dependencies: a graph, entities/concepts/events as nodes, relations as edges
    • Turboparser (Martins et al)
    • JAMR (Flanigan et. al)
  2. spans: segmentation into labeled parts

Both associate words with predicates and arguments

Linguistic Structure Prediction

x -> differentiable -> y (conventional NN) x -> differentiable -> \(argmax_{y\in Y}S^Ty\) -> structured output (input text) -> (part representations) -> dynamic programming/spanning tree/inference method (trained with loss e.g. hinge loss)

Neurboparser Open SESAME

joint learning Use low-rank, sparse tensors Promoting sparsity :

\begin{equation} \lambda \sum_{y_i, z_j \in C} \left|S\left( y_i, z_j \right) \right| \end{equation}

gave 14x speedup

Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs

Structured Attention, Straight-through estimator (Hinton 2012)

SPIGOT (structured projection of intermediate gradients)

\(\hat{Z}\), the convex hull of intermediate parses

Sentence as graph vs sentence as sequence

dependency tree vs dependency semantics

Automatic Essay Scoring

  • Regression, classification, preference ranking
    • Prompt Independent Features
      • length, syntax, style etc.
    • Argumentation Features
      • argument components
      • argument relations
  • Neural Models
    • No need for feature engineering: word (vectors) as features
    • Hard to interpret results

Essay scoring engines provides no feedback to the student on how to improve the essay.

Question Answering & QANet

-end-to-end models

SQuAD dataset

Base Model


  • Idea #1: Combine Convolution and Self-Attention
  • Convolution: Captures local context
    • But Global interaction requires \(O(\log_kN)\) layers, and interactions become weaker as it goes deeper

Position Encoding -> repeat(Separable Convolution) -> Self Attention -> Feed Forward


  • NMT, en - de -en to get new QA pair

Deep Embedding through transfer learning: