Posts

Showing posts from 2015

Choosing Statistical Model

Image

Anomaly Detection

Image
In this post i'd like to share references and articles which I came across while learning Anomaly Detection Techniques like blogs/ papers / patents/ Wikipedia information etc. Popular Anomaly Detection Techniques: Density-based techniques (k-nearest neighbor, local outlier factor, and many more variations of this concept).  Knorr, E. M.; Ng, R. T.; Tucakov, V. (2000). "Distance-based outliers: Algorithms and applications". The VLDB Journal the International Journal on Very Large Data Bases 8 (3–4): 237. doi:10.1007/s007780050006.  Ramaswamy, S.; Rastogi, R.; Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. Proceedings of the 2000 ACM SIGMOD international conference on Management of data - SIGMOD '00. p. 427. doi:10.1145/342009.335437. ISBN 1581132174. Angiulli, F.; Pizzuti, C. (2002). Fast Outlier Detection in High Dimensional Spaces. Principles of Data Mining and Knowledge Discovery. Lecture Notes in Computer Sci

18 mistakes that kill Startups

Image

Creating Service for starting/stopping Tomcat Unix

Create the init script in /etc/init.d/tomcat7 with the contents as per below (your script should work too but I think this one adheres more closely to the standards). This way Tomcat will start only after network interfaces have been configured. Init script contents: #!/bin/bash ### BEGIN INIT INFO # Provides: tomcat7 # Required-Start: $network # Required-Stop: $network # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # Short-Description: Start/Stop Tomcat server ### END INIT INFO PATH=/sbin:/bin:/usr/sbin:/usr/bin start() { sh /usr/share/tomcat7/bin/startup.sh } stop() { sh /usr/share/tomcat7/bin/shutdown.sh } case $1 in start|stop) $1;; restart) stop; start;; *) echo "Run as $0 <start|stop|restart>"; exit 1;; esac Change its permissions and add the correct symlinks automatically: chmod 755 /etc/init.d/tomcat7 update-rc.d tomcat7 defaults And from now on it will be automatically started and shut down upon entering the appropriate r

Change Java Version on a Debian Machine

1. Install OpenJDK 1.7 $ sudo apt-get install openjdk-7-jdk openjdk-7-jre 2. Update Java alternative path : $ sudo update-alternatives --config java There are 2 choices for the alternative java (providing /usr/bin/java).   Selection    Path                                            Priority   Status ------------------------------------------------------------ * 0            /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/java   1061      auto mode   1            /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/java   1061      manual mode   2            /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java   1051      manual mode Press enter to keep the current choice[*], or type selection number: 2 update-alternatives: using /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java  to provide /usr/bin/java (java) in manual mode $ java -version java version "1.7.0_55" OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1~deb7u1) OpenJDK 64-Bit Server VM (build 24.51-b03, mixed

Spark Codes

Clustering import org.apache.spark.mllib.clustering.KMeans import org.apache.spark.mllib.linalg.Vectors val data = sc.textFile("data.csv") val parsedData = data.map(s => Vectors.dense(s.split(',').map(_.toDouble))) val numClusters = 6 val numIterations = 300 val clusters = KMeans.train(parsedData, numClusters, numIterations) val WSSSE = clusters.computeCost(parsedData) println("Within Set Sum of Squared Errors = " + WSSSE) val labeledVectors = clusters.predict(parsedData) labelVectors.saveAsTextFile val centers = clusters.clusterCenters =================================================================== scala> textFile.count() // Number of items in this RDD res0: Long = 126 scala> textFile.first() // First item in this RDD res1: String = # Apache Spark scala> val linesWithSpark = textFile.filter(line => line.contains("Spark")) linesWithSpark: spark.RDD[String] = spark.FilteredRDD@7dd4af09 scala> te

Building Search Engine like Google

Interesting Reads: http://www.rose-hulman.edu/~bryan/googleFinalVersionFixed.pdf

Machine Intelligence / Learning Landscape

Image

Raspberry Pi

Static IP for Wifi sudo vi /etc/network/interfaces Add wifi string ID (Different from wifi SSID) auto lo iface lo inet loopback iface eth0 inet dhcp allow-hotplug wlan0 iface wlan0 inet manual wpa-roam /etc/wpa_supplicant/wpa_supplicant.conf iface default inet dhcp iface home_static inet static    address 192.168.0.53    netmask 255.255.255.0    gateway 192.168.0.1  

Perl

Part 1  Recap of Datastructures  Complex Data structures Anon CDS -Hashes Part 2  Regular Expressions  Subroutines & Files Part 3  Modules  CPAN/DBI/CGI/N-w ------------------------------------------------------- Perl identify the DS by the prefix symbol  Scalar - $  - a value  List   - ()  Array  - @  Hashes - % define a lexical scalar -  my $a; default value scalar    -  undef how to check for undef  -  if(defined($a)){ } get input from keyboard -  $a=<STDIN>    # \n stops                            @arr=<STDIN>  # EOF stops output to console       -  print STDOUT "Hello";                            print "Hello"; Errors                  -  print STDERR "message"; Uppercase of scalar     -  $name=uc($name); Lowercase of scalar     -  $name=lc($name); reverse of scalar       -  $name=reverse($name); Length of scalar        -  $len=length($name); Part of a string        -  $res=substr($str,index,len); others fns              -  index($str

Unix and Hadoop

SCP to a seperate PORT scp -P 5050 asd.tar.gz user@192.168.1.15:/home/user Tomcat Webapps Location /usr/share/tomcat/webapps Get IP address ifconfig eth0 | awk '/inet /{print $2}' Get Tomcat logs tail -f /usr/share/apache-tomcat-7.0.30/logs/catalina.out tail -f -n 10 Running Hadoop Jobs set mapred.job.queue.name=dev hadoop jar acs.jar -D mapreduce.reduce.speculative=false -D mapreduce.map.speculative=false -D mapred.job.queue.name=dev -D mapreduce.task.io.sort.factor=256 -D file.pattern=.*20110110.* -D mapred.reduce.slowstart.completed.maps=0.9 -D mapred.reduce.tasks=10 /Input /Output hadoop jar test.jar -D mapred.job.queue.name=dev -D mapred.textoutputformat.separator=, Glassfish Webapps folder location /usr/share/glassfish4/glassfish/domains/domain1/applications Killing Mysql Processes ps -ef | grep mysql | awk -F" " '{system("kill -9  "$2)}' Starting MySQL /etc/init.d/mysqld start Mo

Free Dashboards

Image
http://webdesign.tutsplus.com/tutorials/build-a-dynamic-dashboard-with-chartjs--webdesign-14363 http://keen.github.io/dashboards/examples/ http://usebootstrap.com/theme/sb-admin

Handling CSV

import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.IOException; public class ReadCVS {   public static void main(String[] args) { ReadCVS obj = new ReadCVS(); obj.run();   }   public void run() { String csvFile = "DailyData.csv"; BufferedReader br = null; String line = ""; String cvsSplitBy = ","; try { br = new BufferedReader(new FileReader(csvFile)); while ((line = br.readLine()) != null) {        // use comma as separator String[] splits = line.split(cvsSplitBy); System.out.println(splits[4]                                   +splits[5]); } } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { if (br != null) { try { br.close(); } catch (IOException e) { e.printStackTrace(); } } } System.out.println("Done");   } }

Hadoop Part 1: Hello World

Image
Hadoop Hello World: The Word Count Code: The word count code is the simplest program to get you started with Map Reduce Framework. The task that a wordcount program performs is as follows: Given several text files find a count of number of times each word appears in the entire set It primarily consists of 3 parts: Driver    : Driver portion of the code contains the configuration details for the Hadoop Job. For example the input path, the output path, number of reducers , mapper class name, reducer class name etc Mapper  : Role of mapper in word count is to emit <word, 1>  for each word appearing in the document. Reducer : Role of Reducer in word count is to sum the list of 1's prepared by shuffle and sort phase <word, [1,1,1,1,1,1]>  and emit <word, 6> It's easier to create an eclipse java project and add relevant hadoop jar files for the code below.  package com.kush; import java.io.IOException; import java.util.*; import org.apache.had