Flowers Analysis

The iris flower

In [1]:
@file:MavenRepository("bintray-plugins","http://jcenter.bintray.com")
@file:DependsOnMaven("de.mpicbg.scicomp:krangl:0.7")

import krangl.*

The first records in the input data (which is bundled with krangl) are

In [2]:
irisData
Out[2]:
   Sepal.Length   Sepal.Width   Petal.Length   Petal.Width   Species
            5.1           3.5            1.4           0.2    setosa
            4.9           3.0            1.4           0.2    setosa
            4.7           3.2            1.3           0.2    setosa
            4.6           3.1            1.5           0.2    setosa
            5.0           3.6            1.4           0.2    setosa

The structure of the input data is

In [3]:
irisData.glimpse()
DataFrame with 150 observations
Sepal.Length	: [Dbl]	, [5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0]
Sepal.Width	: [Dbl]	, [3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4]
Petal.Length	: [Dbl]	, [1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5]
Petal.Width	: [Dbl]	, [0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2]
Species	: [Str]	, [setosa, setosa, setosa, setosa, setosa, setosa, setosa, setosa]

Calculate mean petal

In [4]:
val summarizeDf: DataFrame = irisData
    .groupBy("Species")
    .summarize("mean_petal_width") { it["Petal.Width"].mean() }

Print the summarized data

In [5]:
summarizeDf.print()
      Species     mean_petal_width
       setosa   0.2459999999999999
   versicolor   1.3259999999999998
    virginica                2.026

Conclusion: Iris flowers of species virginica have on average the largest petal width.