Home  >   My work  >   Hivemall

2016 - 2022

Hivemall

Scope

Core contributor to Apache Hivemall, a scalable machine learning library that enables SQL-based machine learning at scale on Apache Hive, Spark, and Pig. Contributed to new algorithms, PySpark integration, and community outreach.

Technology

  • Platform: Apache Hive, Apache Spark (PySpark), Apache Pig
  • Algorithms: Classification, Regression, Recommendation, Anomaly Detection, NLP, Topic Modeling
  • Features: Field-Aware Factorization Machines, Data Sketching, Feature Engineering UDFs
  • Integration: XGBoost, LightGBM, Digdag workflow engine

Key Contributions

  • PySpark Integration: Enabled seamless access to Hivemall capabilities through SparkSession with Hive support, combining Spark/Hive scalability with Python ecosystem flexibility
  • Algorithm Implementation: Contributed state-of-the-art generalized factor models and recommendation techniques
  • Query-Based ML in Production: Simplified end-to-end ML workflows through SQL-like interface, reducing complexity from numerous code fragments to dozens of lines of queries

Presentations & Publications

Learn More

  Author: Takuya Kitazawa

I am a product builder, mentor, and advocate for sustainable technology development with a decade of experience in AI/ML products, data systems, and digital transformation. Based in Canada and originally from Japan, I have lived and worked globally, including part-time residence in Malawi, Africa. Visit my portfolio to learn more about my work, or reach out to me at [email protected].