Constraints on methodology of Collaborative Virtual Environment (CVE) evaluations are caused by the peculiarities of CVE technology and developmental phase of the software. This causes a number of threats to the validity and reliability of CVE usability evaluations. These threats are identified below, and recommendations are made to come to solutions for these problems.
Because the production of CVEs is often taking place within the developmental phase of the production cycle, the final product is usually a prototype or demonstrator. Boundaries of the existing technology are pushed to create new ways of doing things, and as a result even more new things become possible. One of the side-effects of doing evaluations on a prototype which is still under development, is that there are no manuals, no fully functioning application, and few opportunities for using proper representative subjects from the population of intended users for the usability studies.
Because of the prototypical nature of the CVE software, the usability researcher is limited in the kind of experimental subjects s/he can get; often these subjects are the developers of the software and their nearest colleagues. One of the consequences of this constraints is that it is not possible to a use a random sample of subjects. In order to generalise the findings from an experiment with a small group of subjects to a larger population, the sample of subjects must be a random choice from the set of representative members of that larger population. Obviously, developers of CVEs are not representative members of the future group of intended users of that CVE, and they have not been randomly chosen. Actually, these developers are a highly specific selection of subjects, and the data gathered from their behaviour will have to be interpreted with this knowledge in mind.
Another characteristic of CVEs is that the users are geographically distributed. A CVE allows multiple users to interact simultaneously within the CVE in real-time, regardless of the physical location of these users. This means that if the CVE is tested for usability, it has to be tested in its distributed functioning for multiple users, as well as for the direct interface offered to the single user. One of the implications of the distributed character of the CVE application and its users, is that it becomes more difficult to conduct proper controlled experiments.
The physical distribution of the users becomes a constraint on the degree to which the evaluation researcher can control the experimental setting. A typical concerns for experimental set-ups is that they should be as similar as possible for each subject in each condition of the study. In order to be able to claim with confidence that the observed difference in behaviour is attributable to a specific different implementation of a construct in the CVE, the researcher needs to rule out any influences on the user, other than the desired ones. This means that the environments of the users should be as similar as possible, the researcher has to behave as similar as possible with each subject in the study, questionnaires should be answered at similar times as possible, subjects within one group should be as similar as possible, subjects within one group should receive similar treatment, etc. When conducting distributed usability studies this becomes a complicated task, because the researcher can not be in all places simultaneously in order to guarantee a similar treatment of all subjects.
Another consequence of evaluating prototypes is that it is often not feasible within the time and effort available to create two (or more) different situations for an experiment. This means that the independent variables can not always be manipulated. For instance, a researcher may have found that having a personal shadow in a CVE may assist orientation and wayfinding. In order to find out what kind of shadow is most effective the researcher needs at least two versions of the CVE, each one with a different kind of shadows, and preferably one CVE without any shadows at all, for the control group. These three versions of the CVE constitute the manipulation of the variable 'shadow'. The group that does best on the orientation and wayfinding points to the CVE with the best shadow. Obviously this is an informative, but labour intensive way of gathering knowledge, which may not always be possible. Thus, sometimes the researcher is limited in the way of researching possible CVE design solutions, to such an extent that an experimental design is impossible.
There are numerous threats to the validity and generalisability of research findings of human behaviour. The threats that are particularly hazardous for CVE evaluation, because of the nature of CVEs, have been listed above. The solution to these problems is to regard them as constraints; not to abandon the research. We employ the best research design we can use when we recognise these limitations and attempt to overcome them as ably as we can. A rich source of information becomes available by exploring human behaviour with and within CVEs, which is an important component for phase 2 & 3 of the empirical cycle of scientific inquiry and subsequently for future empirical research of CVEs.