Welcome!

This is my personal webpage

My name is ir. Jan Vandepitte and I work for Ghent University @ the Department of Reproduction, Obstetrics and Herd Health for the Bovine Buitenpraktijk+ team . I dabble in statistics, programming and animal science.

At the moment I am working on a European project GplusE that tries to find the link between the Genotype and the Environmental factors of dairy cows.

connect with me on Linkedin: .

This website is built using reveal.js.

GplusE

My work for GplusE

Data exploration of WP3 data using an R shiny app

Additional data exploration and analysis using Rmarkdown

Bitbucket explanation

R package explanation

if(!require("APIhelpers"))
{
  if(!require("devtools"))
  {
    install.packages("devtools")
    library("devtools")
  }
  devtools::install_bitbucket("bovianalytics/r-package-ddw-api", ref = "R-package", subdir = "API-helpers", auth_user = "bitbucketusername", password = "bitbucketpassword")
  library(APIhelpers)
}
getPass()
data <- getAllGplusEFiles()

Above code will

  • Download install the R-package from bitbucket using the devtools package (and install devtools itself if necessary)
  • Ask for your username and password for DDW (alternatively in non interactive code you can use the function setLogin())
  • download all WP3 data and put it into list data

GplusE toolkit executables

The GplusE toolkit is composed of the following components

Loadwebapi.EXE

This executable has four parameters:

  1. "username" for DDW
  2. "password" for DDW
  3. "select" a number to select:
    1. GplusE WP3 data: "1"
    2. Herd level KPI: "2"
    3. individual Herd data: "3"
  4. "herd": required in case of the downloading the herd data

These parameters can be passed to the program in three manners:

  1. First it will look into Loadwebapi.exe.config to the appSettings section
  2. Next it will look at parameters passed on the command line (e.g. MS DOS)
    
    Loadwebapi.exe ddwuser ddwpass 3 17
                                            
  3. Finally it will ask for them directly in the program on the console

Example configuration of loadwebapi:


<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <startup>
    <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5.2" />
  </startup>
  <appSettings>
    <add key="username" value="ddwuser" />
    <add key="password" value="ddwpass" />
    <add key="select" value="1" />
  </appSettings>
</configuration>

CowStateMachine.EXE

Going to the shiny app that explores the data of WP3 one can go to the "design DIM vs interbull" tabpage and clicks throught the different phenotypes collected. One can note that the experimental designs of these different phenotypes (and even amongst countries) is very different. Therefore it is a serious task to merge these different designs to do any analysis. For this purpose the cow state machine was developed based on the theories and practices of sliding windows, state machines, event sourcing and the Datomic database. As of now it is a purely sequential and deterministic implementation but it is imaginable to implement this concurrently and non deterministic e.g. using an actor framework.

The implementation was done for simplicity reasons as a .NET console application that can be run locally given the following reasons:

  • The small size of the dataset of WP3. Data of the size up to 10GB can be analysed locally.
  • The diversity of statistical programs used by the GplusE scientist (R, SAS, matlab, MySQL,...?)
  • Data sharing considerations that plague projects with several stakeholders

The rules governing the CowStateMachine.EXE

  1. The program will load files that are to be found in the GplusE subdirectory and where the filename ends with "with-header.csv". This is the default location where Loadwebapi.exe will download the WP3 data. Other files that are placed or generated in the folder and abide to these conventions will also be picked up.
  2. These files must be comma seperated (using ;) and without quotes around character strings. The file must have a header row containing the column names. There must an InterbullNumber
  3. There must be an Animal-with-header.csv file containing all animals and their InterbullNumber
  4. There must be an Calving-with-header.csv file containing all calvings of the animals and their LactationNumber
  5. Based on the fact files a history is built up from days in milk (DIM) 1 to 50 for each cow. The facts in the files are considered mutable or immutable in their timeline according to these rules.
  6. The data is written to one cowEventFile.csv file for day 1 to 50. The design of this date is configurable.

Rules applied while building the cow history

  1. If the header of the fact file contains no DIM the facts in the file will be considered immutable and be copied across all days.
  2. If the header of the fact file contains DIM the facts in the file will be considered mutable (events) and will be copied using a sliding window across the timeline. By default the size of this sliding window is infinite (50 days). This is configurable. If the sliding window size is reduced the events will be copied forward and then backwards in time untill
    (itIsTrueThat
        (isLessThan
            (theDifferenceBetween
                (timeOf
                    theEvent
                )
                (timeOf
                    now
                )
            )
            theMaximumSlidingWindowSize
        )
    )
    The above code is written as an S-expression so it is also interpretable by domain experts without prior programming experience.

Example configuration of cowstatemachine:

The configuration below will generate a event timeline with a maximum sliding window of 4 and output a history of day 14 and 35.
<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <startup>
    <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5.2" />
  </startup>
  <appSettings>
    <add key="maxSlidingWindowSize" value="4" />
    <add key="design" value="14,35" />
  </appSettings>
</configuration>

scripts

In the scripts folder an example R-script "splitData.R" that splits the Diagnose and ClinicalRegistration files in seperate files so they can be independently merged with the timeline.

ClinicalRegistration.with.header <- read.csv(paste(getwd(),"/GplusE/ClinicalRegistration-with-header.csv", sep=""), sep=";")
Diagnose.with.header <- read.csv(paste(getwd(),"/GplusE/Diagnose-with-header.csv", sep=""), sep=";")
Diagnose.with.header.split <- split(Diagnose.with.header, Diagnose.with.header$Diagnose)
for ( name in names(ClinicalRegistration.with.header.split) )
{
  write.table(ClinicalRegistration.with.header.split[[name]], paste(paste(getwd(),"/GplusE/ClinicalRegistration",sep=""),name,"with-header.csv",sep="-"),quote = FALSE,row.names=FALSE,sep = ";")
}
for ( name in names(Diagnose.with.header.split) )
{
  write.table(Diagnose.with.header.split[[name]], paste(paste(getwd(),"/GplusE/Diagnose",sep=""),name,"with-header.csv",sep="-"),quote = FALSE,row.names=FALSE,sep = ";")
}

GplusEToolKitBootstrapper.BAT

In the toolkit is a simple batchscript included to bootstrap the following process:

  1. Execute Loadwebapi.exe to download the data
  2. Run the Rscript to split the data files
  3. Run the CowStateMachine.exe to merge the data

By putting "REM" before a line it will be commented out and the line will not be executed

REM the line below is commented out so the data won't be downloaded again
REM Loadwebapi.exe ddwuser@acme.gov ddwpass 1
"C:\Program Files\R\R-3.1.3\bin\Rscript.exe" scripts\splitData.R
CowStateMachine.exe

quotes

“Functional Hacker := Think like a fundamentalist, code like a pragmatist.”
~ Erik Meijer @headinthebox

“Large systems are composed of smaller ones and therefore depend on the Reactive properties of their constituents. This means that Reactive Systems apply design principles so these properties apply at all levels of scale, making them composable. The largest systems in the world rely upon architectures based on these properties and serve the needs of billions of people daily. It is time to apply these design principles consciously from the start instead of rediscovering them each time.”
~ The Reactive Manifesto

“Simplicity is hard work. But, there's a huge payoff. The person who has a genuinely simpler system - a system made out of genuinely simple parts, is going to be able to affect the greatest change with the least work. He's going to kick your ass. He's gonna spend more time simplifying things up front and in the long haul he's gonna wipe the plate with you because he'll have that ability to change things when you're struggling to push elephants around.”
~ Rich Hickey, Creator of the Clojure programming language

“I'm teaching you things all the time. You might not be learning them, of course”
~ Lu-Tze, The sweeper - Discworld - Terry Pratchett

“One of the most useful things I learned in all this was that many of the things we were building had a very simple concept at their heart: the log. Sometimes called write-ahead logs or commit logs or transaction logs, logs have been around almost as long as computers and are at the heart of many distributed data systems and real-time application architectures.”
~ Jay Kreps, Linkedin

“If you never kill anything, you will live among zombies. And they will eat your brain.”
~ Gregor Hohpe - Enterprise Integration patterns Ramblings on the subject of legacy software

“When you make a system configurable I think it is useful to ask yourself whether you are really just configuring or whether you are programming at a high level of abstraction.”
~ Gregor Hohpe - Enterprise Integration patterns Ramblings on the subject of configuration

“The real key, I think, is that the composability afforded by the functional design approach means that the same approach can be used for both the highest levels of abstraction and the lowest level-behavior is described by functions all the way down (many machine instructions can also be represented as functions).”
~ Erkki Lindpere - Why the debate on Object-Oriented vs Functional Programming is all about composition