Pages

Tuesday, 28 July 2015

DATA AND BASIC TERMINOLOGIES

                         DATA AND BASIC TERMINOLOGIES

Data is all around us. But what exactly is it? Data is a value assigned to a thing. Take for example the balls in the picture below.

What can we say about these? They are golf balls, right? First data point- they are used for golf. Golf is a category of a sport, so this helps us put the ball in a taxonomy(classification of things).

DIFFERENT TYPES OF DATA


MAINLY TWO TYPES


QUALITATIVE DATA: 


It is everything that refers to a quality of something, and it can't be measured. 

eg; colour, texture, feel of an object, the colour of Sky, a description of experiences etc.




QUANTITATIVE DATA:



 It refers to a number. It can be measured, compared.

eg; the number of golf balls, the size, the price , the score on a test etc. 


OTHER TYPES


CATEGORICAL DATA:



It helps to put the item you are describing into a category. In the case of golf balls "used" would be categorical(with categories such as "new","used","broken" etc.)


DISCRETE DATA:


It is numerical data that has gaps in it(whole numbers mainly). It is based on counts. Only a finite number of values is possible and the values cannot be subdivided meaningfully.
eg: number of golf balls(there is no such thing as 0.3 golf balls), shoe size, number of students in a class etc.


CONTINUOUS DATA:


It is a numerical data with a continuous range. It can have any numerical value and can be meaningfully subdivided into finer and finer increments.
eg: diameter of golf balls(10.356 mm), height of a person, size of your foot( as opposed to shoe size which is discrete) etc.


UNSTRUCTURED DATA(DATA FOR HUMANS):


"We have 5 white used golf balls with a diameter of 43 mm at 50 rupees each"- a plain sentence can be understood by human, but difficult to a computer. It is not machine readable.


STRUCTURED DATA(DATA FOR COMPUTERS):


Computers are inherently different from humans.If we want our computer to process and analyze data, it has to be able to read and process the data. It needs to be structured and machine readable form. One of the most common format is CSV( comma separated values).
eg: "quantity", "colour","condition","item", "category", "diameter(mm)", "price per unit"
        5, "white", "used", "ball", "golf", 43, 50



FROM DATA TO INFORMATION TO KNOWLEDGE:

ColourWhite
CategorySport – Golf
ConditionUsed
Diameter43 mm
Price (per ball)50 rupees
But each of the data values is rather meaningless by itself. To create information out of data we need to interpret the data.
Let's take the size; A diameter of 43 mm doesn't tell us much. It is only meaningful when it is compared to other things. In sports there are often size regulations for equipment. The minimum size of a competition golf ball is 42.8 mm. Good, we can use that golf ball in a competition. This is information.  But it is still not knowledge. Knowledge is created when information is learned, applied and understood.

Wednesday, 22 July 2015

LEARNING WITH 'R'

                                 LEARNING WITH 'R'

WHAT IS R?

R is a programming language and software environment for statistical computing and graphics. During the last decade, the momentum coming from both academia and industry has lifted the R programming language to become the single most important tool for computational statistics, visualization and data science. Worldwide, millions of statisticians and data scientists use R to solve their most challenging problems in fields ranging from computational biology to quantitative marketing. R has become the most popular language for data science and an essential tool for finance and analytics-driven companies such as Google, Facebook and LinkedIn.

INTRODUCTION TO R

To learn R, the best way to start is with datacamp(DATACAMP) It has courses that teaches R from basics such as introduction, vectors, matrices, factors, data frames, lists etc,. It gives us the flavour of R and keeps us get going. Below you can download R and RStudio. RStudio IDE( integrated development environment) is a powerful and productive user interface for R.

DOWNLOAD

R

RStudio

   

Sunday, 19 July 2015

INTRODUCTION TO DATA ANALYTICS

                       INTRODUCTION TO DATA ANALYTICS


Data analysis is a vast field and introducing it would take big time. Thanks to my professor Mr.Prithwis Mukerjee and his lucid presentations, it looks simple.

Saturday, 4 July 2015

FIRST POST

    As the first post we go with the name - why data is a sceptre?

  Sceptre is a ornamental rod carried by people as imperial insignia. So the question 'why data is a sceptre?'

               

We have generated more then 90% of data in the last three years itself. We are past megabyte(10^6), gigabyte(10^9), terabyte(10^12) and today data scientist uses yottabyte(10^24) to describe amounts of data.
Yesterday in analytics class I got keen insight what a data scientist should do

When Steve Jobs were given the first prototype of Apple's ipod, he dropped it in aquarium and used the air bubbles to prove there was empty space and it could be made smaller.
Moving forward is good, thinking beyond that is what makes you unique.
Imagine something which does not exist and work to make it comes to reality.
Create something beautiful from the raw materials of data, and let it make the World a better place.
For a data scientist, his mind is his employer and his work is his employee.
And all this root back to simple word 'data'. So, data is a sceptre.