Analyzing Data in Excel

There are a variety of methods that may be utilized to analyze data. Many statistical packages are available, but oftentimes data can also be analyzed in Excel.

Data Analysis Example


Month

Name

Gender

Diagnosis

Treatment

May

Jessica

F

Allergy

Eye Drops

May

Sam

M

Allergy

Eye Drops

May

Wes

M

Cataract

Cataract Surgery

May

Rachel

F

Ptergyium

Eye Drops

May

Lily

F

Allergy

Eye Drops

May

Hannah

F

Cataract

Cataract Surgery

May

Denise

F

Allergy

Eye Drops

May

Sharon

F

Allergy

Eye Drops

May

Robin

F

Allergy

Eye Drops

May

Lianna

F

Ptergyium

Eye Drops

May

Thomas

M

Presbyopia

Reading Glasses

May

Kimberly

F

Refractive Error

Distance Glasses

May

Michael

M

Refractive Error

Distance Glasses

May

Jacob

M

Conjunctivitis

Eye Drops

June

John

M

Presbyopia

Reading Glasses

June

Tim

M

Refractive Error

Distance Glasses

June

Allison

F

Cataract

Cataract Surgery

June

Laura

F

Ptergyium

Eye Drops

June

Scott

M

Cataract

Cataract Surgery

June

Sarah

F

Ptergyium

Eye Drops

June

Alex

M

Ptergyium

Eye Drops

June

Robbert

M

Cataract

Cataract Surgery

There are many ways to analyze data in excel. Using the above database as an example, we will examine several methods. The easiest way is to use the sort function. This is quick for smaller databases, but this will not be as helpful for larger database. For this database above, let us pretend that we want to count the number of people with allergies and how many are male and how many are female. To do this, we can sort by diagnosis, and then by gender:

Here are the commands using Excel 2007 for Windows:

First, select your data (control A selects all). On the top of the Excel tool bar, choose the “data” tab. Then, click the sort function (circled below in yellow). In the window that pops up, click sort by “diagnosis”. To sort again by gender, click the button in the left hand corner of the window that says “add level”. Then, click “gender” and then the “ok” button.  (see picture below)

1

Sorting is a great tool to see trends, and to analyze small amounts of data. In this case, when we sort by diagnosis and then by gender, we can simply count the number of people with each diagnosis and record the gender breakdown. However, with data sets which consist of thousands of people, counting would be very tedious, and Pivot Tables are a great alternative.

Using Pivot Tables in Excel

Pivot Tables automatically will sort data for you and list values. To create a pivot table, select your data, click on the “insert” tab, and then select “pivot table”. (For Macs click on the data tab and then “pivot table”).

2

The Pivot Table should then open in a new sheet of the excel file. Then, we must add values. On the right hand side, there is a box that says “choose field reports to add”. Since we want to first sort by diagnosis, drag the diagnosis label (the one with the check box next to it) into the “row label” box. It should look like this:

3

Now we want to know how many people had each diagnosis. To do this, we drag “diagnosis” (the one with the check box next to it) to the box with the heading “values”.  It should look like this:

4

To sort by gender, we just drag “gender” to the box with the heading row labels, and we will get an automatic breakdown, which can be used to calculate %ages, create graphs, etc.

5

If you prefer to sort first by gender, and then by diagnosis, then switch the order of gender and diagnosis in the row label box.


This is just one example of how pivot tables can be used. Fields can be added or removed as necessary.

Hints For Analyzing Data

When using the sort function, or Pivot Tables, the data must be “cleaned”. This means that you must go through and make sure that there is consistency in the data among columns. In this case, for diagnoses, it is important to make sure that only one word (or abbreviation) is used to describe each diagnosis. If there are multiple words, for example, say “allergy” is written in the database as “allerg”, “allergy”, and “allergies” this will make the analysis more difficult, so pick one and stick with it. If there are multiple diagnoses, it is also easier if they are not lumped together, such “allergy and cataract”, or “refractive error, presbyopia.” To avoid this problem, it is important to separate the diagnoses out, and if there are multiple diagnoses, then create additional columns, such as diagnosis 2 and diagnosis 3. Then place one diagnosis in each column. (If this is the case you may have to create multiple pivot tables and manually add the results together).

The Importance of Reporting All Results

When analyzing data, it is critical to report all results, even if they seem insignificant. It is also essential to not lump data analyses together and make generalizations. Let’s use the example of a researcher conducting a study on the effectiveness of a visual aid to increase knowledge of cataracts. To test the effectiveness of the visual aid, the researcher administers a 10-question survey to patients before they are shown the visual aid, and then again after. The researcher finds that the visual aid increases the overall number of questions answered correctly. This is a good start, but it is not enough. It is critical that the researcher analyze the results of each individual question. Just knowing that the intervention increases overall knowledge provides little information, since it is important to know the strengths and weaknesses of the intervention. Perhaps the intervention caused a significant increase in the number of people understanding what a cataract is, but not the number of people understanding proper post-operative procedures. This is important information to know because the intervention can then be modified to better convey the necessary information.