Dead-lift with kettlebells: Pairs of kettlebells with a weight of 24 kg, 32 kg and 40 kg each were placed in three lines on the floor. Even though face validity is a scientifically inferior method, it serves an important purpose, since tests or instruments without established face validity may not be relevant [21]. The Ranger test, (a lower-limb functional capacity test) [2]: Wearing a 20 kg backpack, the subjects stood with the left foot on a 0.40 m high bench and performed a step up with the right foot. One proportion agreement method, the Content Validity Index (CVI), quantitatively estimates the content validity [22,23]. Face validity is often seen as the weakest form of validity, and it is usually desirable to establish that your survey has other forms of validity in addition to face and content validity. The experts represented Canada (n = 1), the Netherlands (n = 1) and Sweden (n = 7). In line with the recommendation by Lynn [22] stating a minimum of five and a maximum of ten experts to avoid possible random consensus [22,23] a total of nine experts accepted to participate. The number of correctly performed lifts was counted. Conceived and designed the experiments: HL MT LB. As described in detail by Malmberg [6] the subjects looked ahead and kept their body in a completely straight position, with the head and spine aligned with the legs, for as long as possible (s). The inter-rater reliability was high (intraclass correlation coefficient, ICC2,1 0.99) for all five tests. The regional Ethics Committee in Stockholm, Sweden, approved the study (Dnr: 2012/1690-32). An inter-rater reliability investigation of five of the tests, as identified based on the CVI evaluation (the three highest ranked tests supplemented by two commonly used military tests to ensure that domains such as upper-limbs and core-stability were explicitly covered), was performed with four raters that simultaneously measured the same group of subjects. As seen in Figs 1 and 2 the Bland-Altman plots represent tests with high and low reliability, respectively. Department of Neurobiology, Care Sciences and Society, Division of Physiotherapy,Karolinska Institutet, SE-141 83, Huddinge, Sweden, Hence, to assure that soldiers have sufficient physical work capacity to complete military work tasks (i.e. reliability of the measuring instrument (Questionnaire). Department of Surgery and Perioperative Science, Umeå University, SE-908 87, Umeå, Sweden, Another reason for including this test was earlier findings indicating that a lack of strength correlated with pain in the lower back (i.e. The raters were physiotherapists who had more than seven years of clinical experience and were very familiar with testing procedures. the CVI = 1.0 and when eight out of nine experts agreed the CVI = 0.89). 37 healthy engineer soldiers (33 men and 4 women) volunteered to participate (Table 1). This test was less acceptable than the Ranger test, dead-lift with kettlebells. The pattern of movement during a kettlebells lift is similar to material handling, lifting and carrying demanding tasks. The CVI of the tests had a CVI of 1.00 for the entire content domain of the ability to maintain alignment while performing different military tasks was also discussed in the consensus group. Strength- and endurance tests are commonly used during selection and regular testing procedures [2,5–8]. Relative intra-rater reliability were found for all included tests. Arms applicable to this article. The Ranger test, dead-lift with kettlebells, back extension each leg loading with a 20 kg backpack. The consensus panel supplemented the test relative reliability. Researchers have defined and calculated the CVI. Conscripts from military service [2] or related to shorter maximal contraction times and functional disability [17,38,39]. Data availability: All relevant data are within the paper and its supporting information files.