Excellents!

If you are working with medium size data (i.e. not too big to fit on one machine, but getting close to that), one r package you need to check out is Matrix, for sparse matrices. It is very easy to use. For example, assume you have three vectors i, j and x where i and j are the row and columns of the matrix, and x are the values. Then you can create a sparse matrix via

sm <- sparseMatrix( i = i, j = j, x = x ).

You can still slice and dice as you normally would, but to add rows or columns, you use rBind and cBind (rather than rbind and cdbind). And you can do fun things like

hist( sm@x )

to plot a histogram of the non-zero matrix elements. In a project I was recently working on, memory use went for the matrix went from GB to MB, and run time for doing various things to the matrix went from a couple hours to about a minute. Of course, it is important to keep your matrix sparse, so you may have to think a little about your code, but it was well worth the effort. Excellent!

I’m sure you’ve seen all the hype around deep learning. I certainly have and found two types of references. Some were clearly selling the hype, and had virtually no technical details. Others assumed you already knew a lot about deep learning. I finally found a great introduction http://neuralnetworksanddeeplearning.com/index.html. Seriously excellent.

Wakefield https://github.com/trinker/wakefield is excellent!

Finally, an excellent use for baseball:
http://varianceexplained.org/statistics/beta_distribution_and_baseball/
http://varianceexplained.org/r/empirical_bayes_baseball/
http://varianceexplained.org/r/credible_intervals_baseball/
http://varianceexplained.org/r/bayesian_fdr_baseball/

Leave a comment