Everything you ever wanted know about
readability tests but were afraid to ask.
Why are we looking at readability tests?
The use of readability tests in the plain language process is a controversial
topic. Now that readability scores are easy to obtain by using computerized
grammar and style checking software programs, there is new pressure to
adopt them. While some people use readability tests to help them make
their writing plainer, other people are fervently opposed to their use.
For example, ten years ago the International Reading Association and
the U.S. National Council of Teachers of English were advising members
against uncritical use of readability tests to assess educational materials.
At about the same time, two government reports in England validated
the accuracy and reliability of the tests. Some of the disputes about
readability tests arise because people make use of them for different
purposes and different purposes from those that lay behind the development
of the tests.
We want to look at the original reasons for development of readability
tests, the historical development of the tests and the purposes to which
they are now put. From there, we can discuss how they ought to be used
in the plain language process.
What is readability?
Readability describes the ease with which a document can be read. Readability
tests, which are mathematical formulas, were designed to assess the suitability
of books for students at particular grade levels or ages.
The tests were intended to help educators, librarians and publishers
make decisions about purchase and sale of books. They were also meant
to save time - because before the formula were used those decisions
were made on recommendations of educators and librarians who read the
books. These people were taking books already written and figuring out
who were the appropriate reading groups.
Webster's defines "readable" as
- fit to be read,
- interesting,
- agreeable and attractive in style; and
- enjoyable.
Obviously, readability formulas cannot measure features like interest
and enjoyment. Also, when we ask whether text is understood by its reader
we are questioning is "comprehensibility". Readability formulas cannot
measure how comprehensible a text is. And they cannot measure whether
a text is suitable for particular readers needs.
A brief historical overview
The first formulas
Readability formulas were first developed in the 1920s in the United States.
From the earliest efforts to today, readability tests have been designed
as mathematical equations which correlate measurable elements of writing
- such as the number of personal pronouns in the text, the average number
of syllables in words or number of words in sentences in the text.
Factors like these are usually described as "semantic" if they concern
the word used and "syntactic" if they concern the length or structure
of sentences. Both semantic and syntactic elements are surface-level
features of the text, and do not take into account any the nature of
the topic or the characteristics of the readers.
Designers of one early formula began with 289 elements of content,
style of expression and presentation, format and organization and reduced
them down to the 5 style factors which could be counted most reliably
and would be most relevant to the needs of adults with limited reading
skills. Four of the fhe factors were
- number of personal pronouns
- average number of words in a sentence
- percentage of different words
- number of prepositional phrases
How and why were they developed
The very first readability study was a response to demands by junior high
school science teachers to provide them with books which let them teach
scientific facts and methods rather than get bogged down in teaching the
science vocabulary necessary to understand the texts. The earliest investigations
of readability were conducted by asking students, librarians, and teachers
what seemed to make texts readable.
The publication in 1921 of The Teacher's Word Book by
Thorndike provided a means for measuring the difficulty of words and
permitted the development of mathematical formula. Thorndike tabulated
words according to the frequency of their use in general literature.
Later other word lists and reading lessons were adopted to measure word
difficulty. It was assumed that words that were encountered frequently
by readers were less difficult to understand than words that were appeared
rarely. Familiarity breeds understanding. There is some soundness to
this. There are today more than 490,000 words in the English language
and another 300,000 technical terms. It is unlikely that an individual
will use more than 60,000 and the average person probably encounters
between 5,000 and 10,000 words in a lifetime.
Readability formulas today
How do they work?
Readability formulas measure certain features of text which can be subjected
to mathematical calculations. Not all features that promote readability
can be measured mathematically. And these mathematical equations cannot
measure comprehension directly. Readers can be questioned or tested on
material they have read and the material can be tested with formulas.
The readers success in understanding the material as measured on an exam
can be correlated to the readability score of the text itself. This is
one method to validate the formulas.
Other features of a document are just as important as word length
and sentences to determining reading ease. Other aspects of language,
sentence structure, and organization of ideas are significant to comprehensions.
Also physical aspects of the document are important. These are type
styles, layout, design, use of graphics and so on.
Other features of clear writing are:
- use of language that is simple, direct, economic and familiar
- omission of needless words
- use of sentence structures that are evident and unambiguous
- organization and structure of material in an orderly and logical
way
So readability formulas are considered to be predictions of reading ease
but not the only method for determining readability. And they do not help
us evaluate how well the reader will understand the ideas in the text.
What factors do they measure?
Today readability formulas are usually based on one semantic factor
(the difficulty of words) and one syntactic factor (the difficulty of
sentences). Studies have confirmed that the inclusion of other factors
in the formula contributes more work than it improves the results. Put
another way, counting more things does not make the formula any more
predictive of reading ease but takes a lot more effort.
Words are either measured against a frequency list or are measured
according to their length in characters or syllables. Sentences are
measured for the average length in characters or words.
Graphs, charts and computer functions
Readability tests can be performed manually by counting and doing a
mathematical calculation, or be referring to a chart or graph. Readability
tests can be performed by computer. Most grammar or editing software
today can perform several readability tests.
Fog Index: The total number of words is divided by the total
number of sentences to equal
(1) The average # of words per sentence
The # of words with more than 3 syllables is divided by the total number
of words to equal the Percentage of difficult words (2)
Total these two figures (1 and 2) and multiply that total by 0.4. (3)
This figure (3) is the Fog Index in years of education
Flesch Scale: The Flesch Reading Ease Scale is the most widely
used formula outside of educational circles. It is the easiest formula
to use, and it makes adjustments for the higher end of the scale. It
measures reading from 100 (for easy to read) to 0 (for very difficult
to read). A zero score indicates text has more than 37 words on the
average in each sentence and the average word is more than 2 syllables.
Flesch has identified a "65" as the Plain English Score. In response
to demand, Flesch also provided an interpretation table to convert the
scale to estimated reading grade and estimated school grade completed.
Developments
In 1963 Fry published his readability graph which was easier than manual
computations. The graph was revised in 1977 and then became the most
widely used formula. A hand-held calculator was developed to do the
Fry test, and now it is incorporated in computer programs.
Computerization
Also in 1963, the first computerized readability formula was developed
and many others have been devised since. Some computer formulas are
based on characters per word and characters per sentence while others
measure syllables. The difference between computerized measures today
depend on the developers decisions about how to measure sentences or
words. For example, some programs treat a period, colon, or semi-colon
as the sign of the end of a "sentence". This is in keeping with some
research which concludes that the sentence is not the unit for measure.
Rather the "sousphrase" which we might consider to be a clause represents
the unit of thought for measure because it is the cognitive decoding
unit.
Software programs
Today most grammar software programmes provide more than one readability
measure as well as comparisons to well-known writing. In addition to
word, sentence and paragraph statistics, Grammatik IV gives the Flesch
Readability Scale, Gunning's Fog Index in years of education, and the
Flesch-Kincaid Reading Grade Level. In addition to a qualitative assessment
of the writing, Stylewriter, a plain-English editorial program, provides
word and sentence statistics with an index percentage of the passive
verbs used as well as a count words in various categories: complex,
jargon, abstract, legal, tautologies and so on.
What is Cloze procedure?
The "cloze" procedure for testing your writing is often treated as a readability
test because a formula exists for translating the data from "cloze tests"
into numerical results. The name "Cloze" comes from the word "closure".
In this procedure, words are deleted from the text and readers are asked
to fill in the blanks. By constructing the meaning from the available
words and completing the text, the reader achieves "closure". (elaboration
below)
In 1953 the "cloze procedure" was developed and later, after 1965,
formulas were developed for its use. It became a popular method for
measuring the suitability of text for a particular audience. It was
popular because its scoring was objective; it was easy to use and analyze;
it used the text itself for analysis; and it yields high correlations
to other formulas.
The cloze technique does not predict whether the materials is comprehensible;
it is an actual try-out of the material. It tells you whether a particular
audience group can comprehend the writing well enough to complete the
cloze test.
Cloze procedure consists of deleting words in a text and asking the
reader to fill in the appropriate or a similar word. Usually every fifth
word is deleted. Cloze is thought to offer a better index of comprehensibility
than the statistical formulas. The ability to identify the missing word
or to insert a satisfactory substitute for the original word indicates
that the reader comprehends the content of the text.
Close testing has been called a "rubber yardstick" because Cloze scores
reflect both the difficulty of the text and the readers abilities or
resources. Like any readability test, the problem arises over what is
considered a successful completion of the text: inserting 50% of missing
words, 75% or 100%. Today educators recognize that cloze procedre4us
are more suitable to assess readers' abilities than to measure the readability
of text. Critics have pointed out that cloze can operate on the basis
of measuring redundancy -- that in some texts it measures the number
of redundant words rather than implicit words.
In particular, critics suggest that Cloze is inappropriate for measuring
text or reader's abilities in languages other than their native language.
The results of close testing reflect the reader's basic intuition about
the structure and vocabulary of the target language -- and that does
not exist for the language student.
Cloze testing is widely used now to assess the abilities of readers,
but is usually combined with other tests measuring grammar skills and
writing ability. One educator comments:
The underlying assumption in cloze testing is that a close
relationship exists between reading comprehension and writing skill.
The test measures the student's ability to select appropriate words
if occasional gaps occur in a passage, based on their ability to infer
meaning from context and cultural experience. The word cloze is related
to the concept of closure, the human tendency to complete a partly finished
pattern, to pick out key words and rely on language repetition in English
discourse. The theory origin ated in Gestalt psychology and assumes
that in figuring out the missing word, the mind goes through a process
of sampling, predicting, testing, and confirming the appropriate word
choice. The argument is that this process involves both recognition
skills (required in discrete formal testing) and the production of a
significant content (required in written passages). In theory at least,
the cloze test is an integrated rather than a formal test, but the advantage
is that it can be marked efficiently and objectively.
("Assessment Report, Communications Discipline", by Roslyn Dixon, Communications
Assessment Coordinator, Douglas College, June 1, 1989)
One critic discussed Cloze in the context of it use in languages other
than English:
There is controversy regarding the use of cloze procedure
in determining the readability of written materials. This controversy
is based on the fact that cloze is a subjective evaluation that mirrors
the language ability and background of information of the person taking
the test. Also, some researchers feel that multiple cloze passages should
be de4veloped from each piece of material for the results to be valid.
For example, a test deleting every fifth word should be prepared in
five versions, omitting a different word each time. Though these views
are shared by other countries, for want of a better technique, cloze
procedure is widely used.
(Annette T. Rabin, "Determining Difficulty Levels of Text Written in
Languages Other than English" in Zakaluk and Samuels, p.46-76)
Should you use readability formulas in your
writing?
What some of the opinion-makers say:
Readability formulas measure word length or frequency and sentence length.
In using the formulas we accept that these features affect readability
in a significant way.
Yet it can be argued that long sentences and difficult words are merely
signals that the text is not written for ease of understanding. Some
say difficult text often contains difficult words because it discusses
abstract ideas while easy text uses common words because it discusses
concrete experiences. Chosing smaller words and shorter sentences may
not be as much help as reconstructing the sentences and using familiar
vocabulary.
The Delegates Assembly of the International Reading Association
resolved against using grade-level scores in 1981. And the (U.S.) National
Council of Teachers of English advise against uncritical use of readability
formulas in assessing text for school use. After 1981, the College Entrance
Examination Board decided not to use grade-level measures to ascertain
reading abilities of college applicants.
In recent years, researchers have emphasized that readability tests
can only measure the surface characteristics of text. Qualitative factors
like vocabulary difficulty, composition, sentence structure, concreteness
and abstractness, obscurity and incoherence can not be measured mathematically.
They have pointed out that material which receives a low-grade level
score may be incomprehensible to the target audience. As an example,
they suggest that you consider what happens if you scramble the words
in a sentence, or on a larger scale, randomly rearranged the sentences
in a whole text. The readability score could be low, but comprehension
would be lacking.
example: Fall Humpty had Dumpty great a.
Things they can do
- Their primary advantage is they can serve as an early warning system
to let the writer know that the writing is too dense. They can give
a quick, on-the-spot assessment. They have been described as "screening
devices" to eliminate dense drafts and give rise to revisions or substitutions.
- In some organizational settings, readability tests are considered
useful to show measurable improvement in written documents. They provide
a quantifiable measure of improvement or simplification.
Things they can't tell you and why
- how complex the ideas are
- whether or not the content is in a logical order
- whether the vocabulary is appropriate for the audience
- whether there is a gender, class or cultural bias
- whether the design is attractive and helps or hinders the reader
- whether the material appears in a form and type style that is easy
or hard to read
Because the readability formula are based on measuring words and sentences,
they cannot take into account the variety of resources available to different
readers. Reader resources are word recognition skills, interest in the
subject, and prior knowledge of the topic. The formula cannot measure
the circumstances in which the reader will be using the text or form -
both the psychological and the physical situations. The formula cannot
adjust for the needs of people for whom the text is written in a second
or additional language.
Studies have shown that readability, interest and prior knowledge
in the reader are equally important factors in comprehension and retention
of information. The ease of reading that the reader experiences is also
directly influenced by the writer's use of physical, syntactic, semantic
and contextual cues which cannot be measured by these tests. Such clues
include the use of personal pronouns, the lay-out and design of the
text, the typography (use of highlighting and italics, etc), the use
of signal words (now, then, but, later) and so on.
Readability tests cannot tell you whether the information in the text
is written in a way to interest the reader, nor can they tell you whether
reader has sufficient background information to appreciate the new information
provided in the text.
How to use readability tests
Researchers have been critical of using readability tests on readers of
an additional language. They point out that these tests cannot take into
account that we mentally process our first language differently than we
do additional languages we have acquired. Therefore a reader does not
approach the text with the same or similar intuition for the language
existing among native users. This is important when using cloze testing
on text intended for people reading in an additional language. It is also
significant when designing the testing groups for cloze tests or try-outs
of the material. A population which meets the same criteria for first
language must be used to accurately assess the readability of material
written in a second or additional language.
Keep the readability formula out of the writing process itself.
Follow other guidelines to writing. If you like to work with guidelines
in checklists, use the Document Design Centre's Guidelines, the CBA/CBA
Guidelines, the CLIC Red Alert Editing System or Fry's Writeability
Checklist.
Use the formulas for feedback only:
Write -- apply the formula -- revise -- test.
Remember that the readability test is only a screen and offers only
a prediction. Remember that the score is only a prediction that the
text is suitable for a particular reading grade. Remember that the formulas
do not take into account other features which contribute to comprehension
so they may underestimate or overestimate the suitability of the material.
Bear in mind that at higher grade levels the scores are not reliable
because background and content knowledge become more significant than
style variables.
Consider again the purpose of the text. Material which is intended
for training readers can be more challenging to their resources than
material whose purpose is to inform or entertain. As well, higher motivation
in the readers may keep them reading challenging material which they
might otherwise abandon out of frustration.
Pick a formula that works best for you and for the task at hand. Choose
one that is easy to use. It should contain two variables whether words
and sentences or characters per sentence and characters per word. For
significant projects, use more than one test and expect slightly different
grade level scores.
Test a large sample of the text or the whole text if using a computer
program. By hand, test at least 3 sections of 100 words to arrive at
an average score. Be cautious of doing so if there are great differences
between sections of the text.
Combine the use of formula with other methods of testing
There are other methods for assessing text for suitability for readers.
You can devise a document audit instrument which takes into account
other characteristics that formulas cannot predict. Prepare a questionnaire
to review with the document to seek out features known to make reading
easier.
Or use experts. In education it is common to use teachers and librarians
to review material and assign an appropriate grade level for the use
of the text. In other fields, find experts who will know the needs and
characteristics of your audience and get their expert opinions.
Or use "protocol-aided revisions" as a method. These are "try-outs"
on individuals or small groups who match your audience's key characteristics.
Formal testing with focus-groups is often beyond the budget and capabilities
of those preparing materials. But informal, or casual, testing of materials
with readers is very effective even on a small scale.
Summing up
Readability formula are not guides to writing well. The notion of "writing
to formula" has been condemned by formula designers from the beginning.
They call it "cheating" and compare it to holding a match under a thermometer
to warm a room. Klare has said that formulas can play a useful screening
role in the prediction of readability, where only index variables in language
are needed. But formulas cannot be used in the production of readable
writing, because index variables are insufficient for the purpose. For
producing readable writing more variables must be considered in both the
text and the reader.
(Klare, A Second Look at the validity of Readability Formulas Journal
of Reading Behaviour, 1976, 8 129-152, and present reference)
Sources used
Readability: It's Past, Present, & Future Beverly L.
Zakaluk and S. Jay Samuels, editors, published by the International
Reading Association, Newark, Delaware, 1988
Small Claims Court Materials: Can They Be Read? Can They Be
Understood? by Richard Darville and Marilyn Hiebert Canadian
Law Information Council, CLIC Papers on PLEI, no. 7, 1985
|