Sunday, June 5, 2016

Curl Command and its attributes



curl -L -o donation.zip http://bit.ly/1Aoywaq

http://bit.ly/1Aoywaq is the URL of a zipped file "donation.zip"
 Donation.zip contains multiple zipped files (block_1.zip to block_10.zip) along with a documentation and a csv file that stores the frequency count.
donation.zip : destination filename mentioned in the command line, where the downloaded fill will get saved

-o:

curl command is used to download a file from a particular URL. We can save the result of the curl command in a file with attribute -o/-O
-o  (lowercase o) : result will be saved in the file name mentioned in the command line
-O (Uppercase O) : result will be saved in the filename taken from the filename mentioned in the URL

-L : 

If the source file is moved to some other location, then an HTTP location header is send as the response to the curl command. If we mention -L as an attribute in the command line, curl will take the HTTP location header and follow the new location to get the file.

Alternative of curl command is the wget command which is also used to download the offline page of a particular URL.




Friday, June 3, 2016

#Tweetsmap : twitter analytics

Tweetsmap is an interesting feature of Twitter analytics @ http://tweepsmap.com/  . It allows account specific social network analysis. It lets you find out who follows you, based on country, state/province and city. As it says in its website , its a geo targeted twitter analytics and management. It creates an interactive follower map showing where they reside. This can be further drilled down for analysis.

Another interesting feature is follower segmentation. This features allows us to segment audience based on a range of filters like there gender, location, follower count, the keywords found in there profile etc.

There are several other interesting features like finding out who unfollowed you, who made a mention of you around the map.


Sunday, May 29, 2016

SAS Enterprise Guide 7.1


SAS Enterprise Guide is a boon for those who like a point and click type of software. It does not reqiure any coding and in turn codes for you. For those who are not good at or not interested in coding, this is the thing for you.

Its not an open source software like R but has several good features and the great deal of stability, due to which big companies still go for it.

It improves the data quality and programming quality by giving a visual of the complete process flows and the links between them. The interface might look intimidating in the first but one you go along with the software you will find it helpful. Its easy to find errors from the flow. It provides with a better understanding of the underlying business process.

The drag and drop pieces are well structured. There are context sensitive help for understanding the code. One good feature is that its easy to generate multiple output types after we run the query.

Hadoop certification - Hortonworks or Cloudera


How does certification help?

Professionals with certifications mostly have an edge over others. Many of these certifications are designed in a way that the knowledge gained during the preparation phase, to clear these certifications are at par with the industry standards.

Cloudera or Hortonworks - Which one is better for hadoop certification?

Planning to get on a hadoop certification, the question must have crossed your mind. I did a little research over the internet and found that the certifications of cloudera are popular. Of course popularity cant be accounted for quality. Cloudera offers higher number of courses when compared to Hortonworks and is highly recognised by recruiters. It has been longer and has larger market share. Actually Hortonworks is a young startup in this area and might catch up soon. My opinion is that it doesnt really matters which certification you have, but having one in your resume would help in the resume selection process.