Taking recommendation technology to the masses - Le Zhang (Microsoft)

Challenges:

  1. Limited Resource
  2. Fragmented solutions

https://github.com/Microsoft/recommenders contains modular functions for model creation, data manipulation, evaluation etc.

Collaborative Filtering

Memory based method:

  1. Microsoft Smart Adaptive Recommendation (SAR) algorithm

Model based methods

  1. Matrix factorization methods

    1. Singular Value decomposition
    2. Spark ALS
  2. Neural network-based methods

    1. Restricted Boltzmann Machine (RBM)
    2. Neural Collaborative Filtering (NCF)

SAR (ipynb)

\begin{equation}

\end{equation}

Neural Collaborative Filtering (NCF)

Content-based Filtering

e.g. Factorization machines

\begin{equation} \hat{y}(\mathbf{x}) = w_0 + \sum_{i=1}^{n} w_i x_i + \sum_{i=1}^{n}\sum_{j=i+1}^n \langle v_i, v_j \rangle x_i x_j \end{equation}

TODO xDeepFM

(Guo et al., 2017), (Lian et al., 2018)

TODO Deep Knowledge-aware Network (Wang et al., 2018)

Multi-channel word-entity aligned knowledge aware CNN

Operationalizing a real-time recommender

Scaling Data Science Teams - Miguel Rios (Twitter)

2 models of DS teams in engineering-driven organizations:

embedded model
data scientists part of a smaller team, with other engineers
  • Pros:
    • Dedicated data science resourcing
    • Alignment between DS and the rest of the team
    • One roadmap, fewer dependencies
    • Data science has a more natural “seat at the table”
  • Cons:
    • Rigid resourcing (harder to move DS between teams)
    • Barriers for collaboration between data scientists
    • Manager may not have domain knowledge (typically an EM)
    • Risk of Data Science being a support or service to eng. team
centralized model
data scientists manageed by a data science manager, supporting the product teams
  • Pros:
    • Data scientists working together (collaboration and knowledge sharing)
    • DS manager has domain knowledge (between career dev)
    • Resources can be rebalanced to meet customer demand
    • Advocacy for better and consistent tech (tooling, datasets, etc.)
  • Cons:
    • Coordiantion between teams (DS and stakeholder) becomes more complicated
    • In eng. centric orgs the DS teams need to influence org roadmap
    • Risk of data science work not being aligned with product
    • Company needs to support one more function

Best of both worlds: centralized org with embedded teams

E.g.

Challenges:

  1. Everyone has at least 2 teams - centralized DS team, and part of the product team
    1. Risk of meeting and planning overload
    2. Which is their main team?
  2. Risk of mismatch of expectation between DS leadership and product leadership

How to scale this hybrid org structure to ~100 Data Scientists?

~ Create more layers of abstraction:

“A product as a system”:

Growth DS -> Product DS -> Revenue Science
                ^^^
Insights, metrics, data enigneering, data visualization

Twitter organizes into:

team charters Swimlanes - clear differentiation between teams Working agreement - what to expect from other teams? (e.g. interactions between data engineering & notifications ds team)

Create clear communication channels

Build and strengthen your leadership team

TLDR: align teams with objectives, build structures of your teams: team charters, working agreements, swimlanes, and strong leadership team

Questions: Thoughts on self-servicing (end-to-end) data scientists

Question: How to bridge gap in understanding between data eng and data scientists

Argo: Kubernetes Native Workflows and Pipelines - Greg Roodt, Canva

Github project

Machine Learning as Code - Youtube - How Kubeflow uses Argo Workflows as its core workflow engine and Argo CD to declaratively deploy ML pipelines and models.

Argo’s DAG UI looks nice!

Data Architecture 101 for Your Business - Bence Faludi, Independent Consultant

Data Collection

Storage and Flow

Database Query Engine

Visualization

E.g. stack Kinesis Data Firehose, S3, Airflow, EMR-Presto (Athena for large jobs), Apache Superset

Bibliography

Guo, H., Tang, R., Ye, Y., Li, Z., & He, X., Deepfm: a factorization-machine based neural network for ctr prediction, CoRR, (), (2017).

Lian, J., Zhou, X., Zhang, F., Chen, Z., Xie, X., & Sun, G., Xdeepfm: combining explicit and implicit feature interactions for recommender systems, CoRR, (), (2018).

Wang, H., Zhang, F., Xie, X., & Guo, M., Dkn: deep knowledge-aware network for news recommendation, CoRR, (), (2018).