Skip to content

Machine Learning

krangl integrates nicely with existing JVM machine learning libraries.

Machine Learning with SMILE

The https://github.com/haifengl/smile projects contains tons of implementations needed for modern data science. Among others

  • Classification Support Vector Machines, Decision Trees, AdaBoost, Gradient Boosting, Random Forest, Logistic Regression, Neural Networks, RBF Networks, Maximum Entropy Classifier, KNN, Na├»ve Bayesian, Fisher/Linear/Quadratic/Regularized Discriminant Analysis.

  • Regression Support Vector Regression, Gaussian Process, Regression Trees, Gradient Boosting, Random Forest, RBF Networks, OLS, LASSO, Ridge Regression.

  • Feature Selection Genetic Algorithm based Feature Selection, Ensemble Learning based Feature Selection, Signal Noise ratio, Sum Squares ratio.

  • Clustering BIRCH, CLARANS, DBScan, DENCLUE, Deterministic Annealing, K-Means, X-Means, G-Means, Neural Gas, Growing Neural Gas, Hierarchical Clustering, Sequential Information Bottleneck, Self-Organizing Maps, Spectral Clustering, Minimum Entropy Clustering.

  • Manifold learning IsoMap, LLE, Laplacian Eigenmap, t-SNE, PCA, Kernel PCA, Probabilistic PCA, GHA, Random Projection, MDS

  • Nearest Neighbor Search BK-Tree, Cover Tree, KD-Tree, LSH.

  • Sequence Learning Hidden Markov Model, Conditional Random Field.

  • Natural Language Processing Tokenizer, Keyword Extractor, Stemmer, POS Tagging, Relevance Ranking

Example

It's easy to use krangl and smile to build data science workflows. Here is an example for doing a PCA.

val irisArray = irisData.remove("Species").toArray()

//barchart
pca.varianceProportion.withIndex().plot({ it.index }, { it.value }).geomCol().show()

val projection = pca.setProjection(2).projection

// merge back in the group labels to color scatter
var pc12 = projection.transpose().array().withIndex().deparseRecords {
    mapOf(
        "index" to it.index + 1,
        "x" to it.value[0],
        "y" to it.value[1])
}

pc12 = pc12.leftJoin(irisData.addColumn("index") { rowNumber })

pc12.plot(x="x", y = "y", color = "Species").geomPoint().show()