I have recently published Recommendation.jl, a Julia package for recommender systems. The package is already registered in METADATA.jl, so it can be installed by:
$ julia
julia> Pkg.add("Recommendation")
Last year, I took Introduction to Recommender Systems, an online course created by University of Minnesota, on Coursera. Although the course assignments originally require us to use spreadsheet (on Google Drive or MS Excel), I personally solved all of them by programming in Julia. Recommendation.jl is created as a result of the coursework assignments.
This article explains about the details of Recommendation.jl. If you are interested in functionality and usage, you can refer to README of the repository. In case you have comments or suggestions for future update, issues and/or pull requests are very welcome!
Note: The online course is now opening as Recommender Systems Specialization. Course contents might be different from the previous version that I have completed.
Why Julia?
Julia is a relatively new programming language developed by computer scientists at MIT, and the language mainly focuses on high-performance scientific computing by utilizing the just-in-time compiler. Conventionally, MATLAB has been widely used for numerical computing, but it is in some sense inefficient proprietary software. Thus, open-sourced Julia's efficient implementation is getting the attention of research communities in these days: Julia: A Fast Language for Numerical Computing.
We can readily use various scientific algorithms on MATLAB and Julia by integrating third-party packages, and their syntax dedicated to vector and matrix computations definitely accelerates algorithm development both in industry and academia. However, in terms of recommender systems, there are currently no effective Julia packages which enable us to make fundamental CF, SVD and MF-based recommendation. Therefore, I developed a basic toolkit for recommender systems in Julia.
Static analysis on a local machine with classical techniques is essentially important as the first step for building your own recommender systems, and the package helps you to analyze own user-item data. Additionally, since the implementation and Julia itself are highly flexible, implementing new algorithms on the package should be easy.
Basic structure
In general, recommender systems somehow handle a number of events which represents user-item interactions. So, our package describes each event as an Event
composite type:
type Event
user::Int
item::Int
value::Float64
end
A field value
can be unary or arbitrary real number depending on the feedback types.
In order to represent a series of Event
, we define a data accessor as follows:
immutable DataAccessor
events::Array{Event,1}
R::AbstractMatrix
user_attributes::Dict{Int,Any} # user => attributes
item_attributes::Dict{Int,Any} # item => attributes
end
This abstraction allows us to interchangeably access to an array of Event
and user-item matrix. Notice that user and item attributes can be stored as key-value pairs on DataAccessor
for content-based and feature-based recommenders.
Once a dataset is converted into a data accessor, we can launch various kinds of recommenders on the package. Importantly, all of the recommendation techniques implemented in the package are based on an abstract base type Recommender
, and the following functions should be implemented on each recommender:
build(rec::Recommender; kwargs...)
- Building a recommendation engine from a data accessor.
check_build_status(rec::Recommender)
- Check whether recommender is already built before making recommendation.
predict(rec::Recommender, u::Int, i::Int)
- Making prediction for a given user-item pair.
ranking(rec::Recommender, u::Int, i::Int)
- Computing a ranking score for a given user-item pair.
recommend(rec::Recommender, u::Int, k::Int, candidates::Array{Int})
- Making top-$k$ recommendation for a given user from a list of item candidates.
In particular, what recommend()
does is to compute ranking scores for all possible user-item pairs and return top-$k$ highest-ranked items. Since this recommendation procedure is always same regardless of recommenders, the function is precomposed in the package:
function recommend(rec::Recommender, u::Int, k::Int, candidates::Array{Int})
d = Dict{Int,Float64}()
for candidate in candidates
score = ranking(rec, u, candidate)
d[candidate] = score
end
sort(collect(d), by=tuple->last(tuple), rev=true)[1:k]
end
The recommend()
function actually works correctly, but the implementation is not efficient enough; computing the ranking scores one-by-one might be computationally expensive, especially for the massive item candidates. Hence, the function should be improved in the future by taking more efficient approaches such as parallelization.
Another common function is check_build_status()
. The function checks a recommenders' build state and throws an error if it is still not built:
function check_build_status(rec::Recommender)
if !haskey(rec.states, :is_built) || !rec.states[:is_built]
error("Recommender $(typeof(rec)) is not built before making recommendation")
end
end
Implementing a recommendation algorithm on Recommendation.jl
To give an example, the following code demonstrates implementation of the simple popularity-based recommender:
immutable MostPopular <: Recommender
da::DataAccessor
scores::AbstractVector
states::States
end
MostPopular(da::DataAccessor, hyperparams::Parameters=Parameters()) = begin
n_item = size(da.R, 2)
MostPopular(da, zeros(n_item), States(:is_built => false))
end
function build(rec::MostPopular)
n_item = size(rec.da.R, 2)
for i in 1:n_item
rec.scores[i] = countnz(rec.da.R[:, i])
end
rec.states[:is_built] = true
end
function ranking(rec::MostPopular, u::Int, i::Int)
check_build_status(rec)
rec.scores[i]
end
The MostPopular
recommender type is initialized by a data accessor, and a recommendation engine can be built by counting the number of nonzero elements in each column (i.e., item) of a user-item matrix. Ultimately, the frequency is the ranking scores which determine the most promising items.
It should be noticed that predict()
does not necessarily to be implemented because recommend()
internally uses scores obtained from ranking()
. In case only predict()
is implemented on a recommender, ranking()
works as an alias of the predict()
function by default:
function predict(rec::Recommender, u::Int, i::Int)
error("predict is not implemented for recommender type $(typeof(rec))")
end
function ranking(rec::Recommender, u::Int, i::Int)
check_build_status(rec)
predict(rec, u, i)
end
The separation of predict()
and ranking()
functions is based on LibRec, a Java library for recommender systems.
Consequently, when researchers and engineers like to implement new recommendation techniques in Julia by using the Recommendation.jl library, they simply need to care about the following three points:
- How to build a recommender from a data accessor,
- How to make prediction for a given user-item pair,
- How to compute a ranking score for a given user-item pair,
in addition to converting own data into a data accessor. The flexible data accessors and Julia's dynamic type systems clearly make the package extensible. Note that the idea of the flexible implementation originally comes from LensKit, a recommender-specific toolkit running in JVM. LensKit is fully taking advantage of flexibility achieved by dependency injection.
Example of matrix-factorization-based recommendation
Integrating a data accessor, recommender and metric eventually enable us to compute the accuracy of recommendation. Here, I give an example of the whole procedure using Recommendation.jl:
using Recommendation
const n_user = 5
const n_item = 10
# Create a data accessor from events
events = [Event(1, 2, 1), Event(3, 2, 1), Event(2, 6, 4)]
da = DataAccessor(events, n_user, n_item)
# Initialize a MF-based recommender with the data accessor
# The number of latent factors is set to 2
recommender = MF(da, Parameters(:k => 2))
# Build the MF-based recommender based on the SGD optimization
build(recommender, learning_rate=15e-4, max_iter=100)
# Make top-k recommendation for a user from a set of item candidates
u = 4
k = 2
candidates = [i for i in 1:n_item] # all items
recommend(recommender, u, k, candidates)
recommend()
eventually returns tuples of sorted item IDs and their scores.
What's next
This article has introduced the basic concept underlying my new Julia package for recommender systems.
Currently, the package has very limited functions what I have learnt from the "Introduction to Recommender Systems" course. More powerful recommendation techniques (e.g., Tensor Factorization, Factorization Machines) should be implemented in the future. Moreover, creating update()
method is another promising direction in order to test recommendation algorithms in an incremental fashion.
In terms of efficiency, current implementation is naive and does not take advantage of Julia's performance. This view should be considered more to improve feasibility of the package. Meanwhile, comparison with the other recommender-specific libraries is important.
Recommendation.jl is still in version 0.0.1. Keep waiting till new releases and further improvements!
This article is part of the series: Building Recommender Systems in JuliaShare
Categories
Recommender Systems Programming
See also
- 2022-01-08
- Recommendation.jl v0.4.0: Working with Missing Values, Data Typing, and Factorization Machines
- 2019-07-26
- Lightning Talk about Recommender Systems in Julia at #JuliaCon 2019
- 2019-03-31
- Publishing My Master's Thesis with Documenter.jl
Last updated: 2022-05-07
Author: Takuya Kitazawa
Takuya Kitazawa is a freelance software developer based in British Columbia, Canada. As a technologist specializing in AI and data-driven solutions, he has worked globally at Big Tech and start-up companies for a decade. At the intersection of tech and society, he is passionate about promoting the ethical use of information technologies through his mentoring, business consultation, and public engagement activities. See CV for more information, or contact at [email protected].
Now Gift a cup of coffeeDisclaimer
- Opinions are my own and do not represent the views of organizations I am/was belonging to.
- I am doing my best to ensure the accuracy and fair use of the information. However, there might be some errors, outdated information, or biased subjective statements because the main purpose of this blog is to jot down my personal thoughts as soon as possible before conducting an extensive investigation. Visitors understand the limitations and rely on any information at their own risk.
- That said, if there is any issue with the content, please contact me so I can take the necessary action.