Posts

Anomaly Detection

Image
In this post i'd like to share references and articles which I came across while learning Anomaly Detection Techniques like blogs/ papers / patents/ Wikipedia information etc. Popular Anomaly Detection Techniques: Density-based techniques (k-nearest neighbor, local outlier factor, and many more variations of this concept).  Knorr, E. M.; Ng, R. T.; Tucakov, V. (2000). "Distance-based outliers: Algorithms and applications". The VLDB Journal the International Journal on Very Large Data Bases 8 (3–4): 237. doi:10.1007/s007780050006.  Ramaswamy, S.; Rastogi, R.; Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. Proceedings of the 2000 ACM SIGMOD international conference on Management of data - SIGMOD '00. p. 427. doi:10.1145/342009.335437. ISBN 1581132174. Angiulli, F.; Pizzuti, C. (2002). Fast Outlier Detection in High Dimensional Spaces. Principles of Data Mining and Knowledge Discovery. Lecture Notes in Computer Sci

18 mistakes that kill Startups

Image

Creating Service for starting/stopping Tomcat Unix

Create the init script in /etc/init.d/tomcat7 with the contents as per below (your script should work too but I think this one adheres more closely to the standards). This way Tomcat will start only after network interfaces have been configured. Init script contents: #!/bin/bash ### BEGIN INIT INFO # Provides: tomcat7 # Required-Start: $network # Required-Stop: $network # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # Short-Description: Start/Stop Tomcat server ### END INIT INFO PATH=/sbin:/bin:/usr/sbin:/usr/bin start() { sh /usr/share/tomcat7/bin/startup.sh } stop() { sh /usr/share/tomcat7/bin/shutdown.sh } case $1 in start|stop) $1;; restart) stop; start;; *) echo "Run as $0 <start|stop|restart>"; exit 1;; esac Change its permissions and add the correct symlinks automatically: chmod 755 /etc/init.d/tomcat7 update-rc.d tomcat7 defaults And from now on it will be automatically started and shut down upon entering the appropriate r

Change Java Version on a Debian Machine

1. Install OpenJDK 1.7 $ sudo apt-get install openjdk-7-jdk openjdk-7-jre 2. Update Java alternative path : $ sudo update-alternatives --config java There are 2 choices for the alternative java (providing /usr/bin/java).   Selection    Path                                            Priority   Status ------------------------------------------------------------ * 0            /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/java   1061      auto mode   1            /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/java   1061      manual mode   2            /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java   1051      manual mode Press enter to keep the current choice[*], or type selection number: 2 update-alternatives: using /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java  to provide /usr/bin/java (java) in manual mode $ java -version java version "1.7.0_55" OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1~deb7u1) OpenJDK 64-Bit Server VM (build 24.51-b03, mixed

Spark Codes

Clustering import org.apache.spark.mllib.clustering.KMeans import org.apache.spark.mllib.linalg.Vectors val data = sc.textFile("data.csv") val parsedData = data.map(s => Vectors.dense(s.split(',').map(_.toDouble))) val numClusters = 6 val numIterations = 300 val clusters = KMeans.train(parsedData, numClusters, numIterations) val WSSSE = clusters.computeCost(parsedData) println("Within Set Sum of Squared Errors = " + WSSSE) val labeledVectors = clusters.predict(parsedData) labelVectors.saveAsTextFile val centers = clusters.clusterCenters =================================================================== scala> textFile.count() // Number of items in this RDD res0: Long = 126 scala> textFile.first() // First item in this RDD res1: String = # Apache Spark scala> val linesWithSpark = textFile.filter(line => line.contains("Spark")) linesWithSpark: spark.RDD[String] = spark.FilteredRDD@7dd4af09 scala> te

Building Search Engine like Google

Interesting Reads: http://www.rose-hulman.edu/~bryan/googleFinalVersionFixed.pdf