Why One and Two Do Not Make Three : Dictionary Form Revisited

The primary aim of the article is to compare the usefulness of paper and electronic versions of OALDCE7 (Wehmeier 2005) for language encoding, decoding and learning. It is explained why, in contrast to Dziemianko's (2010) findings concerning COBUILD6 (Sinclair 2008), but in keeping with her observations (Dziemianko 2011) with regard to LDOCE5 (Mayor 2009), the e-version of OALDCE7 proved to be no better for language reception, production and learning than the dictionary in book form.1 An attempt is made to pinpoint the microand macrostructural design features which make e-COBUILD6 a better learning tool than e-OALDCE7 and e-LDOCE5. Recommendations concerning further research into the significance of the medium (paper vs. electronic) in the process of dictionary use conclude the study. The secondary aim which the paper attempts to achieve is to present the status of replication as a scientific research method and justify its use in lexicography.


The usefulness of paper and electronic dictionaries
Electronic counterparts of printed monolingual English learners' dictionaries, available on CD-ROMs, online, or -increasingly often -on portable electronic devices, are taken for granted these days.Some of them appear to be quite close to their predecessors in book form (Rogers 1996, Nesi 1999).However, it is strongly stressed that e-dictionaries should not be just electronic remakes of existing printed dictionaries, but should rather be compiled from scratch as genuine electronic tools and take advantage of the wide array of technological possibilities (Nielsen and Mourier 2005: 110).Although contemporary electronic dictionaries, also those based on paper ones, do employ various functionalities offered by the electronic medium and/or the Web technology, further improvements are suggested (Müller-Spitzer et al. 2011, Prinsloo et al. 2011, Lew: In press, Kwary 2012).While the compilation of electronic dictionaries for foreign learners of English independently of (or in place of) paper dictionaries might be just a matter of time, the coexistence of the two media at present raises an obvious question of their relative usefulness in different linguistic tasks.
There is a vast body of studies where the effectiveness of paper and electronic dictionary use is compared. 2Unfortunately, the results do not permit easy generalisation due to the wide range of user-and task-variables as well as different functionalities and lexicographic data available in the diverse electronic dictionaries used in research.Worse yet, even when the design, dictionary and user differences are neglected, hardly any general picture emerges, either.
First, as regards decoding, no effect of paper and electronic dictionary conditions was found by Nesi (2000), Kobayashi (2007), Koyama and Takeuchi (2007) and Chen (2010Chen ( , 2012)).Electronic dictionaries were however observed to significantly facilitate language reception by Osaki et al. (2003), Osaki and Nakayama (2004) or Dziemianko (2010).In the first two of the abovementioned studies, they also proved to significantly help in identifying contextually appropriate meanings.
Second, different conclusions follow also from the few studies where the influence of paper and electronic dictionaries on language production was tested.In the study by Chen (2010), the subjects were requested to formulate sentences with low-frequency words on the basis of the information found in dictionaries available on hand-held electronic devices and on paper.The results obtained in the encoding task did not depend on the dictionary used.In the study by Dziemianko (2010), in turn, the results from the production task, which consisted in supplying prepositions missing from sentences, were significantly better in the group working with the online version of COBUILD6 than in the one consulting COBUILD6 on paper.
Third, conclusions from studies concerned with the role of paper and electronic dictionaries in vocabulary retention are no less confusing.On the one hand, there are investigations which point to no significant effect of the medium on retention (Koyama and Takeuchi 2003, Osaki et al. 2003, Osaki and Nakayama 2004, Kobayashi 2007, Xu 2010, Chen 2010, 2012).There are also those where the medium proved consequential in this respect.The research conducted by Koyama and Takeuchi (2004) revealed that paper dictionary use resulted in better retention than reference to a portable electronic dictionary.Dziemianko (2010), by contrast, concluded that the consultation of COBUILD6 online resulted in better retention of meaning and collocations than the use of the dictionary in book form.Interestingly enough, the authors of both studies refer to the Involvement Load Hypothesis to account for their findings.Koyama and Takeuchi (2004) suppose that the more demanding process of paper dictionary search is beneficial to retention, in line with the assumption that greater effort means deeper processing, which stimulates retention.Dziemianko (2010), in turn, presumes that the saliency of a dictionary entry on the computer screen as well as the lack of distractions in the form of entries irrelevant to the task at hand, which are bound to be seen on the page of a paper dictionary, induce the cognitive involvement which enhances retention.
Finally, even the replication of a study on the usefulness of paper and electronic dictionaries yields results divergent from those obtained in the original investigation.Dziemianko (2011) adopted the same conditions as those in her pervious study (Dziemianko 2010), except for the dictionary.Instead of COBUILD6, the paper and free online versions of LDOCE5 were offered for consultation.Importantly, the subjects who comprised the other sample were as proficient in English and familiar with paper and electronic dictionaries as those who used COBUILD6 (B2-C1 in CEFR).Despite the same tasks in both experiments, the results from the replication do not confirm previous conclusions.Whereas in the 2010 study it was found that the electronic medium enhanced reception, production and the retention of meaning and collocations, in the more recent investigation dictionary format proved to be inconsequential to the scores on the very same language tasks.In other words, success rates in encoding, decoding and retention were comparable across the two dictionary conditions, i.e., LDOCE5 on paper and online.
To account for the results, Dziemianko (2011) points out that in the free online version of LDOCE5 excessive noise in the form of colourful widgets or animated tower advertisements dwarfs lexicographic data.Such unsolicited (promotional) information in loud colours and different shapes must have diverted the subjects' attention away from dictionary information, which became less prominent and quite inconspicuous.Possibly, then, discerning lexicographic information and extracting it from the glutted website became no less difficult than locating it in a paper dictionary.Unfortunately, neither p-LDOCE5 search nor e-LDOCE5 noise contributed to strengthening the memory trace in a way which could positively influence retention.E-COBUILD6, by contrast, is much clearer and more neatly organised.In particular, there are no advertisements on its website, and dictionary information looks salient on the screen.Possibly that is why it was more useful than COBUILD6 in book form.
The above brief overview of selected recent studies on paper and electronic dictionary use reveals no obvious conclusions concerning the relative usefulness of these media for language reception, production and retention.As already pointed out above, the investigations differ in tasks, subjects, sampling methods, monitoring dictionary use or quantification, which naturally raises serious comparability issues.Unfortunately, the role of dictionary form in other respects, not discussed in the present paper, such as the speed of dictionary consultation, entry navigation, access paths or even dictionary appreciation is no clearer, either (Dziemianko: In press).

1.2
The role of replication The wide variety (and inconclusiveness) of research into the relative usefulness of paper and electronic dictionaries highlights the need for systematic replication.Commonly seen as merely repeating a study to see if the same results can be obtained (Lindsay andEhrenberg 1993: 217, Abbuhl 2012: 296), replication constitutes a crucial scientific method.If carefully designed and conducted, it leads to results that can be generalised, rather than just isolated findings (Lindsay and Ehrenberg 1993: 216).It also increases confidence in the results and helps to establish the reliability of research (Seidlhofer 2003: 215, Gass et al. 2011: 210-211).It is even claimed that "the soundest empirical test of the reliability of data is provided by replicating" (Sidman 1960: 70) and "an isolated study remains virtually meaningless and useless in itself" (Lindsay and Ehrenberg 1993: 218).Gast (2009: 112) gives three reasons why it is worthwhile to replicate previous studies: to assess the reliability of findings (i.e., internal validity), to assess the generality of findings (i.e., external validity) and to look for exceptions (i.e., conditions under which the original findings do not apply).It is thanks to replication that the margin of error is reduced and confidence that findings are not accidental is strengthened.Systematic replication (whereby a researcher carries out a planned series of studies with systematic changes from one study to another and identifies them as a series) is particularly valuable as it makes it possible to establish the generality of findings, or see how broadly the results can generalise beyond the original experiment (Gast 2009: 111-112, 116, 121).Currently, statistical significance is taken for the ultimate objective of a study, rather than just the first step.A statistically significant result means that it is unlikely to be a product of the sampling error and that it is probably real inasmuch as it is likely to be achieved if the whole population is tested.Yet, "[s]ignificance cannot and does not tell us whether the same result would hold again in a different population or under different conditions.To establish that would require much explicit replication" (Lindsay and Ehrenberg 1993: 218).Put differently, "one statistically significant finding cannot be accepted as 'the truth'; only when results are repeated in other studies can we have greater confidence that our decision to accept or reject a hypothesis is correct" (Abbuhl 2012: 306).
Apart from justifying the need for replications, it is necessary to reflect on how research can be replicated.Replications can be plotted along a continuum which extends from exact, through approximate, to conceptual replications, depending on how closely they resemble the original study (Abbuhl 2012: 297-300).Exact replications (also known as literal, strict or virtual), which consist in repeating the original study exactly or as exactly as possible, are mostly unreal, since no groups of subjects with all their idiosyncratic characteristics and experiences can be duplicated (Lindsay andEhrenberg 1993: 200, Macaulay 2003: 78).In the case of approximate replications, also known as replications with changes (Abbuhl 2012: 298), the original study is repeated, but some (typically non-major) variables are modified, e.g., population, setting or task, yet comparability is not lost.The aim of such replications is to verify the generalisability of the results from the original study to a new population, setting or modality.In fact, the differences in the conditions of the consecutive studies are of the essence; it is they that make it possible to see whether results hold nevertheless (Lindsay and Ehrenberg 1993: 217). 3Finally, conceptual or constructive replications diverge from the original study to the largest extent; the same research question is investigated, but a different design is followed.In other words, the findings from an existing study supply the starting point, but researchers develop their own methodology.Such replications make it possible to distinguish between method-specific results and those which can be generalised, but the more variables are changed, the less comparable the original study and its conceptual replication become (Abbuhl 2012: 304).
Unfortunately, replication is held in relatively low esteem; it is considered to be inferior to original research (Umapathy 1987: 170) and lacking in prestige (Campbell 1986: 122).The "pressure to be original" (Park 2004: 194) and the mistaken view that any replication boils down to merely repeating an existing study exactly (Lindsay and Ehrenberg 1993: 220) contribute to the low regard for replication as a scientific method.Although its role in theory development cannot be overestimated, irrespective of whether it supports the tested theory or, perhaps even more importantly -not, replication is seldom undertaken.
As regards research into dictionary use, the value of replication seems to be recognised; the method is claimed to be helpful for improving dictionaries and their usability for language learners (McCreary 2002: 182).However, there are relatively few studies openly acknowledged to be replications of some previous investigations, conducted with different degrees of modification (e.g., Greenbaum et al. 1984, Nesi and Meara 1991, Horst 1995, McCreary and Dolezal 1999, McCreary 2002, McCreary and Amacker 2006, Lew and Doroszewska 2009, Lew and Dziemianko 2006, Lew 2010b, Dziemianko 2011, Chen 2012). 4Admittedly, the study by Greenbaum et al. (1984), which replicates the survey by Quirk (1974), shows that the method has been employed in usercentred research for at least three decades.Yet, the small number of replications cannot be unmotivated.It might result from the fact that many studies on dictionary use are simply non-replicable (Hartmann 1987: 27).The low esteem which replication has is probably another factor which discourages researchers.Besides, it is by no means easy to ensure that the original study and its replication are closely comparable.Although replications are considered advisable when the researcher's aim is to make a new study parallel to an existing one (Lew 2002), direct comparisons can still be quite difficult to perform.For one thing, as pointed out above, exact replications are virtually nonexistent.For another, approximate replications, where the conditions whose influence is of particular interest are purposely varied, obviously give a chance for systematic comparison, provided that the other conditions remain unchanged.Yet, it takes time and effort to control the latter, which makes approximate replications difficult to accomplish successfully.Finally, the fact that not many researchers openly wish their investigations could be replicated in the future (McCreary and Dolezal 1999, Al-Ajmi 2002, Dziemianko 2006, Lew and Dziemianko 2006, Koyama and Takeushi 2007, Tono 2011) suggests that, in fact, the awareness of the benefits which can be derived from replication might need to be raised.It is tacitly assumed that replication "carries more risk than potential reward for both the replicator and the originator of the research" (Park 2004: 194).After all, failure to obtain the same result might be seen as a proof that the latter was wrong, or that the former is incompetent (Lindsay and Ehrenberg 1993: 218).
Indeed, although replications are said to be crucial "to distinguish the spurious from the real" (Abbuhl 2012: 306), there is a strong bias against negative findings.The file-drawer syndrome prevents the publication of many replications which do not support previous findings (Lindsay 1990, Park 2004: 194).Admittedly, confirming replications (whose results agree with those from the original investigation) are valuable inasmuch as they make the corroborated findings more credible.Yet, disconfirming replications are by no means worthless.Assuming that research is conversation, they prove that there is still a need to discuss the issue which turns out to be more complex than it seemed (Lindsay andEhrenberg 1993: 218, Abbuhl 2012: 306).Besides, accounting for the divergent results provides ample scope for originality.
In an attempt to meet the need for systematic replication in research into dictionary use, the next part of the paper describes the second approximate replication of the study by Dziemianko (2010) and the obtained results.

Aim
As mentioned above, Dziemianko (2010) found that e-COBUILD6 was more useful in L2 reception, production and learning (retention of meaning and col-locations) than COBUILD6 on paper.The results were not confirmed by the first approximate replication carried out by the author herself, where the paper and free online versions of LDOCE5 were employed.No statistically significant differences between the results obtained in the paper and electronic dictionary conditions were noted then in any task (Dziemianko 2011).
The aim of the present study is twofold.First, an attempt is made to investigate the usefulness of OALDCE7 in paper and electronic form for language reception, production and learning.Second, Dziemianko's (2010) findings concerning COBUILD6 are compared with those obtained from both replications.
The following research questions are answered: 1. Which version, paper or electronic, of OALDCE7 is more useful for L2 reception, production and learning (retention of meaning and collocations)? 2.
Which dictionary (OALDCE7, LDOCE5 or COBUILD6) and in which form is most helpful in dealing with receptive and productive tasks, and which is the best learning tool?
The CD-ROM and regular printed versions of OALDCE7 were used.The choice of the seventh edition of the dictionary, rather than the latest one, was motivated by the number of copies of the dictionary in book form available in the experimental setting as well as by the functionalities of the electronic version.For one thing, there were enough paper copies of OALDCE7 to go around in the groups in which the study was conducted.For another, the CD-ROM version of OALDCE7 made it possible to see whether some search facilities which it offers (such as automatic scrolling or highlighting the entry for the looked up word, not available in the online versions of LDOCE5 and COBUILD6) matter to dictionary users.

Materials and subjects
The materials used by Dziemianko (2010), i.e., the pretest, questionnaire, test and unexpected delayed post-test, were employed.The subjects did the same receptive and productive tasks as in the original study.In the receptive task, they explained the meaning of nine nouns and phrases (backgammon, booby prize, clampdown, collateral damage, down under, dream ticket, flapjack, onus, outcrop).The productive part consisted in completing sentences with prepositions removed from nine collocations (on the blink, in cahoots with, up the creek, at gunpoint, wreak havoc on, in the offing, in the pipeline, under sedation, on the trot).Both tasks featured in the pretest, test proper and retention test.The pretest served to sift out the cases where the subjects knew correct answers.It was accompanied by a questionnaire to gain an insight into the subjects' familiarity with dictionary formats.Once the pretest and the questionnaire had been completed, the test was administered.In the test, the subjects did the same tasks as in the pretest, but with access to either paper or electronic OALDCE7.In the delayed retention test conducted two weeks later, the sequence of the target structures was reshuffled and no access to dictionaries was allowed.The study was carried out in regular class time (45 minutes).
Great care was taken to ensure that the subjects were as proficient as those in the original research.Overall, 86 students of English (B2-C1 in CEFR) at Poznań University took part in the study; 42 of them consulted p-OALDCE7 and the other 44 the e-OALDCE7.The subjects' proficiency was determined on the basis of the grammar test in the practical English exam taken on a yearly basis.Importantly, the information obtained from the questionnaire indicates that in both experimental conditions the proportions of subjects consulting paper and electronic dictionaries as a matter of routine were comparable (the p-OALDCE7 group: students using paper dictionaries 66.7%, students using electronic dictionaries 69.0%, p=0.83; the e-OALDCE7 group: students using paper dictionaries 63.6%, students using electronic dictionaries 68.2%, p=0.68;Z test for dependent samples, non-significant, alpha-level=0,05).

Research question one (the usefulness of OALDCE7)
The mean proportions of correct answers in the main and retention tests are illustrated in Figure 1.The results of the repeated-measures ANOVAs for both tests are given in Table 1.In each test, the scores on each task were comparable among the users of paper and electronic versions of OALDCE7 at the accepted level of significance (alpha=0.05).
In the main test, the subjects provided over 90 percent of correct answers in each task.The differences in the main test scores between the paper and electronic conditions approximated 3 percent for reception (paper dictionary (PD): 93.2%, electronic dictionary (ED): 96.1%) and production (PD: 95.2%, ED: 93.1%).In the retention test, in turn, active recall in the paper dictionary group (PD: 36.2%) was about half as good again as in the electronic dictionary group (ED: 23.8%).For passive recall, the difference, still in favour of the paper dictionary, amounted to 18 percent (PD: 34% vs. ED: 28.7%).While the differences were statistically insignificant in the light of the ANOVA, their scale seems to suggest that if the sample had been bigger, they might have gained significance.Yet, the low values of the estimate of effect size (partial η 2 ) computed for the retention test show that the size of each investigated main and interaction effect was very small, which means that only a modest proportion of the respective variance can be accounted for by a given (main or interaction) effect.In particular, only 7.5% of the between subjects variance in retention scores can be attributed to dictionary form (FORM, partial η 2 =0.075).The data show that only one interaction (DICTIONARY x FORM) was statistically significant (p=0.039,alpha=0.05;partial η 2 =0.127).To explore it in more depth, Table 3 shows the results of the Tukey Honest Significant Difference test. 6Figure 2 illustrates the interaction graphically.The results of the Tukey HSD test reveal that in the main test, e-COBUILD6 (98.6%) was more useful than COBUILD6 on paper (92.1%, cf.Dziemianko 2010).However, both versions of LDOCE5 and OALDCE7 were comparably helpful.

Retention test
Summary ANOVA results for the retention test are collated in Table 4.The data indicate that the main effects produced by DICTIONARY (p=0.000) and TASK (p=0.005) were statistically highly significant at alpha=0.05.Also, the effect sizes associated with these factors were large and medium, respectively (DIC-TIONARY: partial η 2 =0.465,TASK: partial η 2 =0.155).Table 5 gives the results of the Tukey HSD test for the two significant effects, illustrated graphically in Figure 3. First, the best retention was observed in the COBUILD6 group, where it exceeded 62% and was significantly better than in the other dictionary conditions.The retention results obtained after reference to OALDCE7 (30.7%) and LDOCE5 (37.5%),only about half as good as among COBUILD6 users, were comparable.Second, meaning (47.9%) proved much easier to remember than collocations (39.0%); passive recall was over one fourth more successful than active recall, and the difference was statistically significant at alpha=0.05.

Factor
The interaction Dictionary x Form, which is not quite statistically significant but approaches significance (p=0.054,alpha=0.05;partial η 2 =0.115,Table 4), also merits further investigation.Results of the Tukey HSD test for the interaction in question are collated in Table 6.The relevant mean proportions are illustrated in Figure 4. Three main conclusions follow from the data.First, it transpires that there were no significant differences in retention between the users of paper and electronic versions of LDOCE5 and OALDCE7.Second, reference to e-COBUILD6 yielded significantly better retention results than reliance on the other e-dictionaries; e-COBUILD users (70.3%) remembered about 90 and 170 percent more than the subjects who referred to e-LDOCE5 (37.4%) and e-OALDCE7 (26.3%), respectively.Third, retention among the users of LDOCE5, OALDCE7 and COBUILD6 on paper was comparable.Even though reference to p-COBUILD6 (54.0%) yielded retention results which were about half as good again as those obtained after the consultation of p-OALDCE7 (35.1%) and p-LDOCE5 (37.6%), on the Tukey HSD test, the difference was not statistically significant at alpha=0.05.

Discussion
Obviously, the replications led to conclusions different from those obtained in the original study.First of all, in contrast to Dziemianko's (2010) findings concerning COBUILD6, the e-versions of OALDCE7 and LDOCE5 proved to be no better for language reception, production and learning than the dictionaries in book form.Second, e-COBUILD6 was found to be a better learning tool than e-OALDCE7 and e-LDOCE5.It is thus necessary to reflect on the micro-or macrostructural features and factors not intrinsic to any dictionary structures which contributed to the success achieved with the help of e-COBUILD6 and prevented e-OALDCE7 and e-LDOCE5 from being likewise useful.
First of all, it is worth noting that the e-COBUILD6 website is quite crude; it is made up of the search window followed by the entry for the looked up word and a few buttons on the right (to be clicked if users wish to expand their vocabulary, customise the dictionary or get help).In e-OALDCE7, in turn, the entry for the looked up word, if short enough, is displayed along with the entries which follow it.This form of presentation resembles the paper dictionary and diverges from the approach adopted by e-COBUILD6, where only the entry for the looked-up word can be seen on the screen.Undoubtedly, the view of entries in a sequence must have naturally dispersed the subjects' attention and disturbed concentration.Such interface dissimilarities might be a reason why the retention scores of e-COBUILD6 users were better than those of the e-OALDCE7 group.The same factor might also account for the lack of any statistically significant difference between the results obtained with the help of the electronic and paper versions of OALDCE7 in the main and retention tests.In e-LDOCE5, by contrast, the entries for the headwords which follow the looked up word are not displayed, but the website overflows with noise, thereby deflecting users from the dictionary itself and making lexicographic data much less salient and distinct (cf.Dziemianko 2011 and section 1.1).This could be a possible reason why e-LDOCE5 was no more helpful in any experimental task than p-LDOCE5.
Apart from the examination of interfaces, item analysis was conducted with a view to explaining the observed results.Looking at the data for individual target items, Dziemianko (2011) drew interesting conclusions about the role of clickable menus in e-LDOCE5, i.e., vertical menus which consist of several matches, each of which is hyperlinked to an entry or subentry.Figure 5 shows such a menu for blink. 7Accessing noun phrases through clickable menus in e-LDOCE5 was found to severely impede reception in comparison with p-LDOCE5.No similar effect of clickable menus was identified on production.However, they proved seriously detrimental to passive and active recall (in comparison with the menu-less access paths in e-COBUILD6).Dziemianko (2011) hypothesised that the mechanical rather than cognitive effort invested into coping with the hierarchical, step-wise outer access structure in e-LDOCE5, at which stage relevant semantic information is not processed yet, did not strengthen the memory trace, but actually prevented successful reception and retention.
OALDCE7 does not feature clickable menus similar to those in e-LDOCE5, but it offers a different functionality -automatic scrolling whereby the looked up compound, phrase or idiom not given the headword status is immediately shown at the top of the screen.It is worth remembering that the results obtained by e-OALDCE7 users in the receptive task in the test proper were on average 3 percent better than in the group consulting p-OALDCE7 (cf. Figure 1).The largest difference in decoding scores between the experimental conditions was observed for down under, which in the paper version is given as the sixth of the seven idioms explained at the end of the 12-sense entry for down (adv).In the electronic version, in turn, down under is immediately shown at the top of the computer screen, its identical placement in the entry for down notwithstanding.Automatic scrolling to the phrase resulted in 29 percent better score.Even though not quite statistically significant (p=0.080,Z test for independent samples, two-tailed, alpha=0.05), the difference was much beyond the aforementioned average (3 percent).
Interesting observations can be made about active recall, which was on average four times better among e-COBUILD6 users than among the subjects consulting e-OALDCE7.Item analysis reveals that this difference owes most to the collocation up the creek, retained over 13 times more often by the e-COBUILD6 group.This tremendous and statistically significant difference (p=0.000,Z test for independent samples, two-tailed, alpha=0.05)results most probably from the fact that the search for creek in e-OALDCE7 yields two matches.The first of them, a proper name irrelevant to the task at hand (Creek -a member of a Native American people, many of whom now live in the US state of Oklahoma), is highlighted, as shown in Figure 6.In e-COBUILD6, by contrast, up the creek constitutes the third subentry of creek, none of which is highlighted.The tentative conclusion which can be drawn from the data is that highlighting the entry for the searched word by default does not pay off when its homograph, treated in a separate entry (which is not highlighted), happens to be what dictionary users need.In such a case, default highlighting can result in immensely poorer retention.The second largest difference in active recall between the groups using e-COBUILD6 and e-OALDCE7 was observed for on the trot.The subjects who consulted the latter dictionary found the phrase in the section devoted to idioms, located at the end of the entry which consists of four verb senses, a subentry for the phrasal verb trot (sth) out and two noun senses.In e-COBUILD6, in turn, on the trot constitutes the third (final) subentry, but the two preceding verb subentries are quite short.Judging by the number of senses which separate the headword from the target phrase, the search path in e-OALDCE7 is three times longer than in e-COBUILD6.This might be a reason why e-COBUILD6 users were about 7 times more successful in active recall than the subjects who referred to e-OALDCE7.Apparently, then, the effort exerted to locate the phrase, as measured by entry length, is inversely related to active recall.In other words, the longer the entry is, the lower the chances of successful retention becomes.Yet, this hypothesis needs to be verified in further studies.It is worth noting that the results obtained in the main test for on the trot indicate that the phrase was extracted with comparable success from both dictionaries (97% in e-COBUILD6 and 97.7% e-OALDCE7, p=0.840Z test for independent samples, alpha=0.05).Such an observation supports the surprising findings by Nesi and Tan (2011), who noted that the senses at the end of the entry are identified with the greatest speed and accuracy by dictionary users, followed by those which are given first.The regularity observed in the entry for trot not only confirms the saliency of the entry-final position, but also suggests that the effect persists regardless of entry length. 8Nonetheless, it transpires that the saliency of entry-final positions has widely different consequences for entry navigation (i.e., finding the needed information) on the one hand, and retention on the other.
The foregoing discussion makes it possible to formulate a few suggestions for further research into e-dictionary use.First, it appears that the role of noise on dictionary websites is worth looking into.It goes without saying that advertisements make online dictionaries accessible to anyone free of charge.No wonder, then, that ad-supported online dictionaries are enjoying considerable popularity. 9Nonetheless, it is open to question whether dictionary websites with and without advertisements are comparably useful.The tentative conclusion following from the present investigation is that unsolicited promotional material diverts users' attention from lexicographic data and actually deprives an online dictionary of much of its usefulness.Second, the effect of the hierarchical nature of data display in electronic dictionaries on retention is another promising area of research.The above assessment of the possible influence of clickable menus on retention, and active recall in particular, is quite pessimistic, but systematic manipulation of fabricated microstructures is necessary to get a deeper insight into the actual significance of clickable menus in electronic dictionaries.Admittedly, research into clickable menus as access facilitators was taken up by Lew and Tokarek (2010), who concluded that such tools help lower-level students navigate a dictionary entry and get to the right sense, but are of no real benefit to advanced users.Apart from regular clickable menus, the authors looked into the usefulness of clickable menus where the target sense was automatically highlighted.Such menus proved comparably useful at both proficiency levels.However, no attempt has yet been made to investigate the effect of menus in paper or electronic dictionaries on retention (cf.Nesi and Tan 2011, Tono 2011, Lew 2010b).Third, it might be useful to explore the influence which highlighting entries in electronic dictionaries exerts on active and passive recall in the case of homographs treated in different entries, only one of which is highlighted.While highlighting entries by default seems attractive, it transpires that bringing out the entry which does not feature the information that a user wishes to find has a negative impact on retention.At this stage it is worth distinguishing between highlighting entries and highlighting specific senses.The latter was found a welcome navigation enhancement in polysemous microstructures, where it assists users in reaching the relevant sense more quickly and accurately (Lew and Tokarek 2010).
Unfortunately, the present study is not free from limitations.First, a number of subject variables were not controlled.Only the subjects' familiarity with dictionary formats and proficiency in English were taken into consideration, since they were considered most likely to immediately affect dictionary use and language skills.Besides, it needs to be remembered that real dictionaries rather than systematically manipulated microstructures were employed.Such an approach resulted in a naturalistic task, but it made it difficult to pin down specific factors responsible for the observed effects.To establish the role of selected factors, entries need to be fabricated and systematically manipulated, which no doubt creates more tightly controlled, albeit more artificial, conditions.The use of actual paper and electronic dictionaries also means that dictionary form alone may not be the key factor which determines the effectiveness of dictionary consultation.Specific solutions adopted and form-independent typographical structural indicators (Gouws 2003), such as font size and colour, line spacing or layout, which remained beyond control in the studies discussed above, can play an important role in dictionary use (cf.Lew 2010a: 294, Nesi: In press).To reduce their influence, printouts of the electronic dictionary screen display could be used instead of a real paper dictionary.Such task operationalisation could help to isolate the factor of dictionary form (on-screen vs. paper) and free it of the effect produced by typographical structural indicators (cf.Chen 2012).Nonetheless, in this way the paper dictionary user is also largely helped inasmuch as only mini-dictionaries covering the key items rather than complete paper dictionaries are typically produced from printouts, which seriously limits and simplifies outer access (Bergenholtz and Gouws 2007: 243). 10 All in all, whereas the present study proved to be quite exploratory in nature at the stage of item analysis, it made it possible to develop a few testable hypotheses which merit further attention.In this way it hopefully confirmed that replication as a research method does not entail lack of originality.Impor-tantly, it also showed that approximate replication helps to validate theories and substantiate generalisations.Ultimately, it is replications that contribute to making research a truly accretive process whereby knowledge is accumulated and consolidated over time, and, by the same token, prevent a discipline from being composed of scattered hypotheses and observations (cf.Santos 1989).
In the article, where differences between the dictionaries and their forms are of the utmost importance, the aforementioned, generally accepted acronyms are used for the sake of convenience.In the list of references, full bibliographic information is provided under the names of the respective dictionary editors, not repeated below: Mayor (2009) -LDOCE5, Sinclair (2008) -COBUILD6 and Wehmeier (2005) -OALDCE7.2.
For an overview, see Dziemianko (In press).

3.
Naturally, the greater the differences are, the higher the risk that the effect will not be replicated.Yet, if it is confirmed, its generality increases (Gast 2009: 111).By the same token, "failure to replicate or follow up on studies with different populations and in different contexts may lead to de facto generalisation" (Duff 2006: 71). 4.
Compare similar remark made by Chi (2009: 14), who also notes the paucity of replications in the field of dictionary use.

5.
In any ANOVA discussed below, TASK was the repeated-measures factor.6.
All the means connected by (****) in one column are not different from each other at p=0.05. 7.
The screenshot also gives an insight into the amount of noise on the e-LDOCE5 website.8.
Only five-sense entries were employed in the study by Nesi and Tan (2011).9.
See also Lew (2011).10.Proponents of the Involvement Load Hypothesis would no doubt claim that simplified outer access can affect retention results, the assumption being that any effort invested in word search, including mechanical page turning and scanning running heads, can increase the chances of successful retention.On the other hand, it is suggested that not any involvement, but only semantic involvement affects vocabulary retention in the process of dictionary use.The aforementioned, largely automatic stages of paper dictionary look-up, might not yet evoke adequate semantic or cognitive involvement to influence vocabulary retention (cf.Craik and Lockhart 1972, Dziemianko: In press).Besides, printouts of an electronic dictionary prevent users from scanning entries close to the target ones, which might also affect retention (Chen 2012).

Figure 2 :
Figure 2: DICTIONARY x FORM: Correct answers (mean %) in the main test

Figure 3 :
Figure 3: DICTIONARY and TASK: Correct answers (mean %) in the retention test

Figure 4 :
Figure 4: DICTIONARY x FORM: Correct answers (mean %) in the retention test

Figure 6 :
Figure 6: The highlighted entry for Creek in e-OALDCE7

Table 1 :
ANOVA summary results (main and retention tests): OALDCE7 5 Figure 1: Results obtained in the main and retention tests (OALDCE7)