Pages

Saturday, 8 August 2015

PYTHON PROGRAMMING- 1 ( RADISH SURVEY)

                      We have 300 lines of survey data in the file radishsurvey.txt. Each line consists of a name, a hyphen then a radish variety. For example Evie Pulsford is a name and April Cross is a radish variety.


Now here comes the business perspective of analysing the data. We decided to find out 
* Did anyone vote twice?
* What are the least popular?
* What is the most popular radish variety?
We do it by python language using spyder ide(Integrated Development Environment).

We have to save the file radishsurvey.txt in the Documents-> Python Scripts folder.
We open a new file in spyder and name it as radishsurvey.py(.py is an extension for python files).
We will code to find who voted for which radish variety. We will use for loop.
On the right corner we can see the output. It is just a modification into readable language. Now we will see how many people voted for "White Icicle" radishes?
It gives names like "Amy Clunie likes White Icicle". The next step is counting votes.
We have counted votes for White Icicle. 59 votes. If needed to count votes for other varieties, no need to write code every time. Generic function can be defined and it can be called where ever necessary. We defined the count_votes function and using that we found out votes for other 2 varieties 
Counting all the votes.
Yes, we have counted all the votes but it looks clumsy.So we will do this easy for people to read. Programmers call it "Pretty Printing". Instead of print(counts) in the above screenshot we will for loop as an option. It will gives the name of the variety and number of votes it has gained.
Now we understood that there is some weird stuff in our vote count. There is red king and there is another Red king. To a computer "red king" and "Red king" looks different because of different capitalisation. We need to clean up(sometimes called "munge") the data so it all looks the same.
But again there are some double spaces between first and second names. So again cleaning up the data. And checking if anyone voted twice.

We will make the code easier to understand by breaking it down and adding comments. For big programs it is essential to factor so it will be easier to understand and reuse.
Now we come to the finale of finding the winner.
The winner is Champion (radish variety name).

3 comments: