This document will help you author surveys for ODK Collect using xlsform syntax. ODK Collect is a powerful tool to rapidly gather complex data types in the field using Android smartphones. The software is free, the source code is open, and the community is very active. Using ODK Collect with Formhub requires the creation of an Excel file that contains the questions, formatting instructions and data validation conditions that will allow enumerators collect data on smartphones. The following module will cover all the syntax elements you will need to author basic and advanced surveys.
Here are the steps involved in writing a survey with the XLSform format and getting it quickly deployed on formhub.
Survey authoring in Excel is a simple process once you understand the basic XLSform syntax.
To make your learning process smoother a generic household survey example will be used across the training modules. This sample survey is a subset of the CDC Household Water Use and Health Survey for a Water Safety Plan. You can download the full Excel survey file form Formhub University, the name of the form is household_survey.
This form would look like the following on your Android smartphone:
Important rules to remember:
Now let’s start populating the survey questionnaire. The first section will likely include basic information such as the respondent’s name, age of the respondent, household number, etc. We already know how to collect the name. For age, household number, and other fields, ODK lets us specify the “type” of the information to be entered (numbers, dates, etc). This will help us reduce data entry errors. To do this we can use a different question ‘type’, ODK Collect has defined a number question types to support all data types that surveys usually collect. Here we present some of the most basic types ODK Collect supports (you’ll see the full list later).
text | Text input. |
integer | Integer (ie, whole number) input. |
decimal | Decimal input. |
Let’s add the household id and respondent’s age as numeric integer fields and the respondent’s name as a simple alphanumeric field. This will require the use of the ‘type’ ‘text’ and the ‘type’ ‘integer’.
An integer field would look like this on the Android smartphone.
Notice how the numeric keypad appears automatically and the alphabetic keyboard is deactivated.
For those of you coming from paper surveys, you will appreciate that Android smartphones can help us collect “new” kinds of data, like GPS co-ordinates, images, audio, video, and information encoded in barcodes and QR codes. For now, we’ll introduce you to two additional questions: ‘types’ (‘geopoint’ and ‘image’).
geopoint | Collect GPS coordinates. |
image | Take a photograph. |
Let’s collect the gps coordinates and a picture of the household using the following XLSform. To make this tutorial easier to read we will switch from the Excel file screenshots to a tabular notation.
Excel 'survey' worksheet
type | name | label |
text | interviewer | A.1.0 Interviewer name |
integer | hh_id | A.1.1 Enter the household number |
geopoint | hh_location | A.1.2 Collect the GPS coordinates of this household |
image | hh_photo | A.1.3 Take a picture of the structure where the household lives |
text | respondent_name | A.1.4 Enter the full name of the respondent |
integer | respondent_age | A.1.5 Enter the age of the respondent |
Notice how the Excel ‘a_worksheet_name’ worksheet preceding the table specifies on which worksheet of the workbook we are working on (we have been working on the Excel ‘survey’ worksheet for now but other worksheets will be added as we develop a more advanced survey).
A geopoint type An image type
Because the XLSform syntax requires a very specific set of instructions, it is quite easy to make errors in the Excel file. Let’s say we misspelled the ‘image’ keyword and wrote ‘images’ instead.
When uploading to Formhub a form with errors, the system will show an error message that will give you an idea of what the error was. In this case Formhub is telling us it does not recognize ‘images’ as a valid question ‘type’.
The process of uploading forms, and correcting errors is called debugging. Let’s debug our form and change ‘images’ for ‘image’ in the ‘type’ column for question A.1.3. (Hint: The “Find” feature in the “Edit” menu of Excel is often helpful here). If we upload the form again we should see the following screen.
It is good practice to upload your XLSform (.xls file) to formhub on regular intervals. Formhub will show only an error message for the first problem it finds with a description of where the error is located. The fewer the changes since your last upload, the easier it is to find and solve the error formhub finds. You should get in the habit of testing your surveys, identifying, and fixing early on.
Sometimes, even if your form successfully uploads to formhub, it may not behave exactly as you intended on the Android smartphone. Therefore, although less frequently, it is also a good idea to test your form as a webform as well as directly on an Android smartphone.
At this stage, let’s try to upload the current file into Formhub and test the form as a webform:
Be aware that the webform mode does not mimic the exact same format that will be displayed on the Android smartphone. You will still need to test the form on an Android smartphone but this intermediate step will let you test many aspects of the question flow and that data entry constraints are working correctly.
...
Notice that our form has three submissions. Download the xls version of the survey results.
The excel survey results are presented in tabular format, each column representing one question. Notice that the name of each column comes from our XLSform’s ‘name’ column.
Notes
Another question ‘type’ that can be very useful to provide information to your enumerator while conducting a survey are notes. Notes are no-input text prompts which can give the enumerator information to read to the respondent, remind her that she is now moving into a new section of the survey, or any other hints or messages to respond to.
Excel 'survey' worksheet
type | name | label |
note | note_consent | Administer informed consent. If subject agrees to participate, proceed to questionnaire. |
Dates
Dates are fairly intuitive, they help collect a specific calendar date.
Excel 'survey' worksheet
type | name | label |
date | respondent_dob | A.1.5 Enter the date of birth of the respondent |
As we continue with the household survey, we are interested in the gender of the respondent, the number of people living in the household and the different sources of water used in the household. For these questions, we could simply ask text and integer questions, but often it is useful to standardize data by asking enumerators multiple choice questions. By using a list of predefined choices in survey questions, we can increase the value of data for quantitative analysis later. This technique obviates the need for “coding” data in many cases, and reduces errors introduced by typos and the need to interpret data.
Multiple Choice Questions
ODK Collect has support for both ‘select one option’ (‘select_one’) and ‘select all options that apply’ (‘select_multiple’) questions. Writing a multiple choice question requires some additional steps to the other types we have used up to now. We need to specify what the predefined list of choices will be!
select_one [choices] | Multiple choice question; only one answer can be selected (the [] notation indicates that you will need to replace [choices] by a name in the choices sheet). |
select_multiple [choices] | Multiple choice question; multiple answers can be selected (the [] notation indicates that you will need to replace [choices] by a name in the choices sheet). |
Here is an example of a ‘select one’ question to restrict the gender to ‘Male’ and ‘Female’:
Note that gender_list appears in both worksheets, this is how the elements are linked between the choice list and the actual survey question. A choice list can be re-used in multiple survey questions.
A more elaborate multiple question that only accepts one selection would be the question on ownership of the house.
Excel 'survey' worksheet
type | name | label |
select_one ownership_list | hh_ownership | A.1.9 This house is … |
Excel 'choices' worksheet
list name | name | label |
ownership_list | own | Owned |
ownership_list | rent | Rented/Leased |
ownership_list | rent_free | Rent-Free |
ownership_list | informal | Informal Settlement |
ownership_list | other | Other |
We can also add multiple choices questions that allow multiple answers to be selected. For example in our survey it could be that households obtain water from different sources for different purposes or in different seasons (dry vs wet season).
Excel 'survey' worksheet
type | name | label |
select_multiple water_sources | hh_water_source | B.1.1 Where do you get the water you use at home? (select all that apply) |
Excel 'choices' worksheet
list name | name | label |
water_sources | house_tap | Household tap |
water_sources | private_tap | Private tap in yard |
water_sources | public_tap | Public/shared standpipe |
water_sources | neighbor_tap | Neighbor's tap |
water_sources | bottled | Purchased bottle water |
water_sources | rain_collect | Rain water collection |
water_sources | river | River/Stream/Creek |
water_sources | spring | Spring |
water_sources | ref_station | Refiling station |
water_sources | other | Other |
Things to remember for multiple choice questions:
This is how the survey result data for multiple questions will look like.
Notice that for the select one option type of questions the responses are in one column named as defined under the ‘name’ column in the Excel 'survey' worksheet. The content of the cells is the name of each alternative as defined in the Excel 'choices' worksheet under the ‘name’ column.
For select all that apply type of questions one column is created for each option defined in the Excel 'choices' worksheet and the name is the name of the question as defined under the ‘name’ column in the Excel 'survey' worksheet followed by ‘/’ and the name of each choice as defined in the ‘name’ column of the Excel 'choices' worksheet. The content of the cells is TRUE or FALSE depending on if that particular option was selected or not.
It is very easy in xlsform to add multiple languages to a survey. To do this, you have to essentially just create a difference label column for each of your languages. For example a form with the columns label::English and label::French etc. will allow the surveyors to choose between English and French. On ODK, to select a different language, press the Menu key, and the “Change Language” option. For the form below, “English” and “French” will show up as the possible options.
Excel 'survey' worksheet
type | name | label::English | label::French |
select_one gender_list | respondent_gender | A.1.7 Enter the gender of the respondent | A.1.7 Indiquez le sexe du sondé |
Make sure that your column headers in the “choices” sheet match those in the “survey” sheet (i.e. there will be errors if you columns are “label::English” on the survey and “label” on the choices worksheet)
Excel 'choices' worksheet
list name | name | label::English | label::French |
gender_list | male | Male | Masculin |
gender_list | female | Female | Feminin |
[Advanced: See the xlsform standard document to see exactly what kinds of column headers can accept a language modification.]
Hints
When asking questions, it often makes sense to give enumerators some extra guidance. You can already do this with notes, but what about something simpler that doesn’t require an extra screen? The answer is “hints,” which are listed along-side the question in italicized font (questions are in bold font). To use them, just create new column in your survey worksheet named ‘hint’, and add your hints.
Excel 'survey' worksheet
type | name | label | hint |
note | note_consent | Administer informed consent. If subject agrees to participate, proceed to questionnaire. |
|
text | interviewer | A.1.0 Interviewer name |
|
integer | hh_id | A.1.1 Enter the household number |
|
geopoint | hh_location | A.1.2 Collect the GPS coordinates of this household | Make sure the precision is less than 25 meters |
image | hh_photo | A.1.3 Take a picture of the structure where the household lives | Take a picture of the front of the house |
text | respondent_name | A.1.4 Enter the full name of the respondent | First, Middle and Last name |
date | respondent_dob | A.1.5 Enter the date of birth of the respondent |
|
integer | respondent_age | A.1.6 Enter the age of the respondent | If respondent does not know put 999 |
select_one gender_list | respondent_gender | A.1.7 Enter the gender of the respondent | (select one) |
integer | hh_people_nb | A.1.8 How many people live in this household? | Including the respondent |
select_one owner_list | hh_ownership | A.1.9 This house is … | (select one) |
note | note_section_b | Section B - Water use practices. This section is composed of 23 questions related to the household usage of water. |
|
select_multiple water_sources | hh_water_source | B.1.1 Where do you get the water you use at home? | (select all that apply) |
Question A.1.6 would now look like this on the Android smartphone.
You can also have your hints in different languages. For example, the above form could use two additional columns “hint::English” and “hint::French” to provide hints in both of those languages.
Metadata
In surveying efforts, it is often useful to collect additional data for administrative purposes. For example, you might want to know the time it’s taking enumerators to conduct the full survey. During the pilot stage, identifying which phone was used to collect certain GPS coordinates might help the detection of faulty GPS systems (important at scale). Other similar informaiton can be used for other administrative and review processes; the “start” and “end” date-time fields are particularly helpful when checking for duplicate information entry.
ODK Collect makes a number of metadata fields available for collection:
start | Start date and time of the survey. |
end | End date and time of the survey. |
today | Day of the survey. |
deviceid | IMEI (International Mobile Equipment Identity) |
subscriberid | IMSI (International Mobile Subscriber Identity) |
imei | SIM serial number (if available). |
phonenumber | Phone number (if available). |
If we wanted our survey to collect all of this metadata we would add the following rows at the top of the survey (they can be anywhere but it is a best practice to put metadata fields at the beggining of the file):
Excel 'survey' worksheet
type | name | label |
start | start | Start Time |
end | end | End Time |
today | today | Date of Survey |
deviceid | deviceid | Phone Serial Number |
subscriberid | subscriberid | Subscriber Identifier |
imei | simserial | SIM serial |
phonenumber | phonenumber | Phone number |
These questions do not appear on the phone at all, but will be automatically included in the data ODK Collect writes to store and transmit the survey data. To see these fields at work, fill out a new blank form on your Android smartphone, save it and send it to the server. When you download the data you will see that new columns have been added to the survey data.
In order to reduce input errors ODK Collect has some options to control and restrict the way data is entered by enumerators. Three types of constructs are available:
ODK allows the survey author to designate certain questions as “required” questions. Normally, enumerators are able to “skip” answering any question by simply swiping forward on a page. However, many surveyors prefer that enumerators enter a specific value (such as “-999” for “Not available”) rather than simply skip question. ODK allows you to enforce this; simply column labeled ‘required’ and filling the cells with the reserved word ‘yes’ to the questions that you consider critical to have in your data.
Excel 'survey' worksheet
type | name | label | required |
note | note_consent | Administer informed consent. If subject agrees to participate, proceed to questionnaire. |
|
text | interviewer | A.1.0 Interviewer name |
|
integer | hh_id | A.1.1 Enter the household number | yes |
geopoint | hh_location | A.1.2 Collect the GPS coordinates of this household |
|
image | hh_photo | A.1.3 Take a picture of the structure where the household lives |
|
text | respondent_name | A.1.4 Enter the full name of the respondent | yes |
date | respondent_dob | A.1.5 Enter the date of birth of the respondent | yes |
integer | respondent_age | A.1.6 Enter the age of the respondent | yes |
select_one gender_list | respondent_gender | A.1.7 Enter the gender of the respondent |
|
integer | hh_people_nb | A.1.8 How many people live in this household? |
|
select_one owner_list | hh_ownership | A.1.9 This house is … |
|
note | note_section_b | Section B - Water use practices. This section is composed of 23 questions related to the houshold usage of water. |
|
select_multiple water_sources | hh_water_source | B.1.1 Where do you get the water you use at home? | yes |
If you test the form, and try to swipe ahead on (for example) question A.1.6, you will not be able to move forward, and instead, get a prompt saying “Sorry, this response is required!”
Caveat: Be careful when adding conditional constructs to a survey or the enumerator might find herself incapable of finishing the survey (if the GPS on the phone is not working and you have required the collection of geographic coordinates, enumerators might not be able to move forward to the next sections of the survey). It is also important to test every survey with conditional constructs as much as possible before deployment. For text and integer fields, be sure to specify what should be entered in case the respondent is not able to answer the enumerator.
Certain questions only have a limited range of possible answers that can be considered as valid. If, for example, you ask a household how many days of the week they have to fetch water, the response can only be a number between 0 and 7. ODK Collect handles this type of data validation with the use of the ‘constraint’ column.
Excel 'survey' worksheet
type | name | label | constraint |
integer | respondent_age | A.1.6 Enter the age of the respondent | . >= 0 |
Here we’ve put a constraint on age so that negative values are not accepted. ODK Collect will not allow the enumerator to swipe to the next question if she enters -5. These constraints can help protect from typos and other unintentional errors.
The constraint formulas can be fairly complex, accomodating the use of and, or and not operators. Lets say that our age limit is 150 (much lower than the oldest person every known to exist), and that we want to allow for the enumerator to enter 999 in case the respondent cannot answer the question. The constraint can be written as follows.
Excel 'survey' worksheet
type | name | label | constraint | constraint_message |
integer | respondent_age | A.1.6 Enter the age of the respondent | (. >= 0 and .<=150) or .=999 | age has to b between 0 and 150, enter 999 if unknown |
In fact, we have even added a new column labeled ‘constraint_message’ that specified the message displayed on the screen (the usual popup message is “Sorry, this response is invalid”):
3.3 Referencing a previous question
Let’s move on, and add a question on the number of household occupants that have more than 18 years old. Here, the number of adults in the household has to be smaller than or equal to the total number of people living in the household.
XLSform can let you reference the results of a previous question by using a very specific syntax. ${name_of_previous_question} will actually fetch the value entered by the enumerator in the question that has a name (as defined in the ‘name’ column of the Excel 'survey' worksheet). Notice that the special character ‘.’ will be interpreted by ODK Collect as a reference to the current question (as if we had typed ${name_of_current_question}).
Excel 'survey' worksheet
type | name | label | constraint | constraint_message |
integer | hh_people_nb | A.1.8 How many people live in this household? |
|
|
integer | hh_adult_nb | A.1.9 How many adults live in this household? | . <= ${hh_people_nb} | has to be smaller than or equal to the number of people in household |
So, here, the answer to question A.1.9 is constrained to be less than or equal to the answer given to question A.1.8.
[Advanced: The column ‘constraint’ takes an XPath expression. The ${QUESTION_NAME} notation generates a valid XPath formula. A question is accepted if the ‘constraint’ column evaluates to true.
Many surveys have questions that should be asked depending on the answer given to previous questions. One great feature of ODK Collect is the ability to only present a question to the enumerator if a condition, based on the response to a previous question, is true.
For example, if you presented a multiple choice question and the answer by the interviewee does not correspond to any of the pre-programmed choices, you might want to offer ‘Other‘ as an alternative. If this is the case you would need to ask the interviewee to specify what other means. To do this we add the ‘relevant’ column and enter a condition that specifies when the questions is relevant and should be presented to the enumerator.
You can interpret this conditions as: “Only ask this question if the answer to the previous question: name_of_question (as defined in the ‘name’ column of the Excel 'survey' worksheet) was equal to ‘option_selected’ (as defined in the ‘name’ column of the Excel 'choices' worksheet)”.
Notice that in the case of ‘select_multiple’ questions the syntax is a bit different and you can actually specify relevant conditions for each one of the options selected. Below is an example of how to do this using XLSform.
Excel 'survey' worksheet
type | name | label | relevant |
select_one owner_list | hh_ownership | A.1.10 This house is … |
|
text | hh_ownership_other | A.1.10.0 Please specify other | ${hh_ownership} = 'other' |
select_multiple water_sources | hh_water_source | B.1.1 Where do you get the water you use at home? |
|
text | hh_water_source_other | B.1.1.0 Please specify other | selected(${hh_water_source}, ‘other’) |
decimal | hh_bottle_expenditure | B.1.2 How much do you spend on bottled water per month | selected(${hh_water_source}, ‘bottled’) |
Excel 'choices' worksheet
list name | name | label |
ownership_list | own | Owned |
ownership_list | rent | Rented/Leased |
ownership_list | rent_free | Rent-Free |
ownership_list | informal | Informal Settlement |
ownership_list | other | Other |
|
|
|
water_sources | house_tap | Household tap |
water_sources | private_tap | Private tap in yard |
water_sources | public_tap | Public/shared standpipe |
water_sources | neighbor_tap | Neighbor's tap |
water_sources | bottled | Purchased bottle water |
water_sources | rain_collect | Rain water collection |
water_sources | river | River/Stream/Creek |
water_sources | spring | Spring |
water_sources | ref_station | Refiling station |
water_sources | other | Other |
If in question A.1.10 the interviewee answers ‘Other’ then the phone will not move to question B.1.1 but will present question A.1.10.0 Please specify other. The content in the ‘relevant’ column: ${hh_ownership} = 'other' actually means: “only ask this question if the answer to the question with ‘name’ equal to hh_ownership was the choice alternative with a ‘name’ equal to other.
If in question “B.1.1 Where do you get the water you use at home?” the interviewee selects the option “Bottled Water” regardless of what else was selected, then question B.1.2 should be presented on the phone, if not then it should be skipped. The content in the ‘relevant’ column: selected(${hh_water_source}, ‘bottled’) actually means: “only ask this question if the choice alternative with a ‘name’ equal to ‘bottled’ was selected amongst any other options in the question with ‘name’ equal to hh_water_source was the choice alternative with a ‘name’ equal to other.
Ask if relevant:
Skip if not relevant:
You will learn how to skip more than one question at a time later on.
Important to remember:
For skip logic based on ‘select_one’ questions, you can use the = (equals) construct (see the living arrangement question above). For skip logic based on ‘select_multiple’ questions, you can use the selected() function, as seen in the water sources question.
For more on proper XPath expressions, see this page: https://bitbucket.org/javarosa/javarosa/wiki/buildxforms
For multiple choice options, frequently, especially in pilot phases, survey authors include an “other” option that has to be specified if selected. XLSForm has a shortcut to make this easier (for English Language surveys; unfortunately, this feature is not available for multiple languages):
Excel 'survey' worksheet
type | name | label | relevant |
select_one owner_list or_other | hh_ownership | A.1.10 This house is … |
|
Excel 'choices' worksheet
list name | name | label |
ownership_list | own | Owned |
ownership_list | rent | Rented/Leased |
ownership_list | rent_free | Rent-Free |
ownership_list | informal | Informal Settlement |
instead of
Excel 'survey' worksheet
type | name | label | relevant |
select_one owner_list | hh_ownership | A.1.10 This house is … |
|
text | hh_ownership_other | A.1.10.0 Please specify other | ${hh_ownership} = 'other' |
Excel 'choices' worksheet
list name | name | label |
ownership_list | own | Owned |
ownership_list | rent | Rented/Leased |
ownership_list | rent_free | Rent-Free |
ownership_list | informal | Informal Settlement |
ownership_list | other | Other |
Important to remember:
Grouping questions has several advantages, the most immediate one is that the group name will be presented to the enumerator as small text before each question. This can be helpful to remember in which section of the survey we are. Other advantages of groups will be explored in the next sections. To create a group of questions try the following:
Excel Excel 'survey' worksheet
type | name | label |
begin group | sectionB | Section B - Water use practices |
note | note_section_b | Section B - Water use practices. This section is composed of 23 questions related to the houshold usage of water. |
select_multiple water_sources | hh_water_source | B.1.1 Where do you get the water you use at home? |
text | hh_water_source_other | B.1.1.0 Please specify other |
decimal | hh_bottle_expenditure | B.1.2 How much do you spend on bottled water per month |
end group |
|
|
Using groups adds the ‘label’ content of the group construct to all questions pertaining to that group.
Using groups also changes how the survey data file columns are named. The group ‘name’ now precedes the question ‘name’ with a ‘/’ in between. Like “sectionB/hh_water_source”.
If you have subsections within your sections, groups can be nested very easily. Here is an example, where we have grouped all questions asked to the respondent within sectionA:
Excel Excel 'survey' worksheet
type | name | label |
begin group | sectionA | Section A - Household Information |
note | note_consent | Administer informed consent. If subject agrees to participate, proceed to questionnaire. |
text | interviewer | A.1.0 Interviewer name |
integer | hh_id | A.1.1 Enter the household number |
geopoint | hh_location | A.1.2 Collect the GPS coordinates of this household |
begin group | respondent_questions | Respondent |
image | hh_photo | A.1.3 Take a picture of the structure where the household lives |
text | respondent_name | A.1.4 Enter the full name of the respondent |
date | respondent_dob | A.1.5 Enter the date of birth of the respondent |
integer | respondent_age | A.1.6 Enter the age of the respondent |
select_one gender_list | respondent_gender | A.1.7 Enter the gender of the respondent |
end group |
|
|
integer | hh_people_nb | A.1.8 How many people live in this household? |
integer | hh_adult_nb | A.1.9 How many adults live in this household? |
select_one owner_list | hh_ownership | A.1.10 This house is … |
text | hh_ownership_other | A.1.10.0 Please specify other |
end group |
|
|
Notice how the nested group is shown on the phone with the parent group ‘label’ and the ‘>’ character in the group text area now. This can be useful for subsections, just remember not to put too many nested groups or your phone screen will become illegible.
In the survey data result file notice how nested groups add the ‘label’ of the parent group to the column name for that variable. The answers for question “respondent_age”are now under “sectionA/respondent_questions/respondent_age”.
One of the most useful uses of groups is to skip a whole bunch of questions at once. In our survey, we only want to ask questions in section B if the respondent is over 18 years of age.
We will need a relevant condition telling ODK the following: “Only present this question if the answer to the question ‘respondent_age’ was larger than or equal to 18” (${respondent_age} >= 18). The following example would skip questions B.1.1, B.1.1.0 and B.1.2 if the respondent is less than 18 years of age.
Excel 'survey' worksheet
type | name | label | relevant |
integer | respondent_age | A.1.6 Enter the age of the respondent |
|
begin group | sectionB | Section B - Water use practices | ${respondent_age} >= 18 |
note | note_section_b | Section B - Water use practices. This section is composed of 23 questions related to the household usage of water. |
|
select_multiple water_sources | hh_water_source | B.1.1 Where do you get the water you use at home? |
|
text | hh_water_source_other | B.1.1.0 Please specify other | selected(${hh_water_source}, ‘other’) |
decimal | hh_bottle_expenditure | B.1.2 How much do you spend on bottled water per month | selected(${hh_water_source}, ‘bottled’) |
end group |
|
|
Without the use of groups you would need to add the relevant condition in each row. This would be cumbersome and difficult to maintain for the survey author. Using groups is a much more efficient way of performing this task.
You might also want to create a repeating set of questions. In our case we might be interested in the age and gender of every member of the household. We cannot do this with existing tools, because we don’t know the number of household members in advance. To do this, we use the following construct:
Excel 'survey' worksheet
type | name | label |
integer | hh_people_nb | A.1.8 How many people live in this household? |
integer | hh_adult_nb | A.1.9 How many adults live in this household? |
begin repeat | household_member | A.1.10 Household Member |
integer | hh_member_age | A.1.10.1 Enter the age of this member |
select_one gender_list | hh_member_gender | A.1.10.2 Enter the gender of this member |
end repeat |
|
|
select_one owner_list | hh_ownership | A.1.11 This house is … |
text | hh_ownership_other | A.1.11.0 Please specify other |
This is a list of household members, with the age and gender of each household member.
The phone will ask the name and age of the first household member, and then ask if the enumerator wants to Add a new ‘Household Member’ group. If the enumerator responds with a ‘yes’ then the name and age of the second household member will be asked, and so on.
Notice that in the group text area there is a counter (a number between parentheses) that tells us how many household members we have added.
Note that the exported data for repeats comes in a variety of formats depending on the export you perform, and data within repeats can be difficult to perform analysis with. We recommend using repeats with care, and constructing a methodology around the use of that repeat data before doing large scale data collection with repeats.
We will walk you through a short example of repeat exports, but please use repeats only after you understand them fully. Let’s say that one of our households had three inhabitants and the other only one. The data would look like this:
The data results for the repeating group are stored in a different sheet in the survey results file (see the second picture; ‘household_member’ sheet). The name of the sheet was derived by the name of this “begin repeat” question in your XLSform.
Explaining in detail what is happening is actually a relational databases 101 course which is not what we want you to learn. Just know that there is a new results sheet and that each row corresponds to one repeating group element. How do we know to which household is each household memeber linked to? Look at the _parent_table_name and the _parent_index column, these columns were automatically generated by Formhub. They basically are telling us that this household member belongs to a household that has index number 1 or 2 in the household_survey worksheet. In this case, the _index column in the household_survey worksheet corresponds to _parent_index in the household_member file.
A data analyst with database abilities will find no problem in using this information. If you have an experienced database manager then go ahead and use begin repeat (with some practice beforehand). If not, we recommend you avoid using this construct and just create multiple questions.
Note: Using nested groups with a field list appearance will crash the ODK Collect software.
ODK Collect allows for certain changes in the formatting of smartphone screens, for example you might want to show multiple choice select_one question as a list box instead of a long list of checkboxes. You might also want to present two questions on the same screen on the phone. Let’s say question B.1.2 is unitless and you want the respondent to specify the currency of expenditure on bottled water. Up to now we would have done this as follows.
Excel 'survey' worksheet
type | name | label |
decimal | hh_bottle_expenditure | B.1.2 How much do you spend on bottled water per month |
select_one currencies | hh_bottle_currency | Currency (specify one) |
Excel 'choices' worksheet
list name | name | label |
currencies | dollars | USD |
currencies | local | Local currency |
which would have shown:
Here is how you would get them to show up on the same screen:
Excel 'survey' worksheet
type | name | label | appearance |
begin group | hh_bottle_group | B.1.2 Bottled water expenditure | field-list |
decimal | hh_bottle_expenditure | B.1.2.1 How much do you spend on bottled water per month |
|
select_one currencies | hh_bottle_currency | Currency (specify one) |
|
end group |
|
|
|
Now, the questions are shown one on top of the other separated by a gray line.
Now let’s use the table-list appearance widget to present the currency choices as an horizontal, more compact, list.
Excel 'survey' worksheet
type | name | label | appearance |
begin group | hh_bottle_group | B.1.2 Bottled water expenditure | table-list |
decimal | hh_bottle_expenditure | B.1.2.1 How much do you spend on bottled water per month |
|
select_one currencies | hh_bottle_currency | Currency (specify one) |
|
end group |
|
|
|
Formatting can also be done at the question level. For example, the applying the minimal “widget” to the select_one currency question will make it show up as a dropdown list:
Excel 'survey' worksheet
type | name | label | appearance |
begin group | hh_bottle_group | B.1.2 Bottled water expenditure | field-list |
decimal | hh_bottle_expenditure | B.1.2.1 How much do you spend on bottled water per month |
|
select_one currencies | hh_bottle_currency | Currency (specify one) | minimal |
end group |
|
|
|
ODK Collect offers many additional widgets that can change the survey formatting. Please look at the ODK_1_1_7_widgets xlsForm that we have shared on the Formhub University account. The Formhub University is a public account on the site that presents advanced features in the form of examples so you can start making your surveys fit your exact needs. If you would like us to add or change forms please let us know.
Your survey can perform dynamic calculations using the values of preceding questions. ODK Collect allows the use of a ‘calculate’ type, similar to any other question type but acts as a hidden field that will not show on the Android smartphone screen. The calculation can be referenced like any other question using the ${name_of_calculated_field} expression.
For example, in our survey, we want to calculate the number of children in the household and shown it to the surveyor. We will require to use the ’calculation’ column. Once a value has been calculated it is not presented to the enumerator directly but can be presented in a note, as part of the label for next questions or as a conditional construct (constraint or relevant).
In this case, we want to calculate the number of children in the household based on the difference between the total number of people in the household and the number of adults in the household. We also want to present the results to the enumerator in a note. Our Excel 'survey' worksheet will look like this:
Excel 'survey' worksheet
type | name | label | calculation |
integer | hh_people_nb | A.1.8 How many people live in this household? |
|
integer | hh_adult_nb | A.1.9 How many adults live in this household? |
|
calculate | hh_children_nb | (Label is not required because calculates do not display any content.) | ${hh_people_nb} - ${hh_adult_nb} |
note | hh_summary_nb_display | Confirm that the household is composed of ${hh_adult_nb} adults and ${hh_children_nb} children |
|
The survey goes directly from question A.1.9 to the note. Behind the curtains the calculation was performed and saved in the variable ‘hh_children_nb’. Notice how the note is presenting dynamic content and in particular notice that the enumerator never asked or entered any data about the number of children, the calculate field determined there were 8 children in this household.
The survey result data does not make any distinction between the data entered manually and a calculated variable.
More details about the language that javarosa uses, and the operators that are available are on this page.
Caveat: + - * correspond to add, subtract, and multiply. Division however, is special, and you
need to use the word “div” to do division (for example, the ratio of households in the household
could be calculated by the formula: ${hh_adult_nb} div ${hh_people_nb}). This is a decision made by the xpath language, and unfortunately, we cannot do much with it in XLSform at the moment.
Binding conditions and formulas can be as complex as you want. You can combine and, or and not logical operators with arithmetic operators. Our question B.1.1 allows to select multiple options for the sources of water, what if the respondent answers “I don’t know”. In this case we should probably add an additional choice with the “I don’t know option” but this adds one issue. The surveyor might check House tap and Bottled water and also check the I don’t know check box. This would not make much sense, you could add a condition to avoid this situation. The following example presents this situation. For simplicity purposes we have removed some of the options that we previously had defined. We added the ‘dk’ option at the end of the choices list.
Excel 'choices' worksheet
list name | name | label |
water_sources | house_tap | Household tap |
water_sources | bottled | Purchased bottle water |
water_sources | other | Other |
water_sources | dk | I don't know |
Excel 'survey' worksheet
type | name | label | constraint |
select_multiple water_sources | hh_water_source | B.1.1 Where do you get the water you use at home? | not ( selected(., 'dk') and ( selected(., 'house_tap') or selected(.,'bottled') or selected(.,'other') ) ) |
Notice how the constraint condition becomes quite long. This constraint is telling ODK Collect to allow the surveyor to move forward only if the combination of selected items is not I don’t know and any other option.
Now we want to present the interviewee with a series of selection questions, where the answer depends on a previous selection. The classic example is asking someone to enter a district in a country. But you don’t want the user to deal with 777 options at once (example from Nigeria), so you present enumerators with a question asking them about the zone they are in (6 in the country), the state they are in depending on the zone (less than 10 per zone), and then, finally, the LGA, depending on the state they are in.
This possible with xlsforms without cascading select, but it requires you to write out one question per state in this case (36 different questions), with a fairly complex criteria in the relevant column. And even then, the end information you care about is what zone, state, and LGA (local government authority) the enumerator has filled out information about. Calculating this information requires writing a gigantic if statement, with the same number of branches as there are states in the country! This is quite an unimaginable process. We now present a syntax addition to overcome this issue and collect cascading-select information.
Since xlsforms are computer processed, we introduced a feature called cascading-select, where the user is able to simply list the cascading relationships. In a new worksheet called ‘cascades’ you create a new column ‘name’ and then an additional column for each level of the cascading hierarchy (if you are working with continents and countries you would have three columns ‘name’ ‘continent’ and ‘country’). In our case we have three levels so we would need a total of four columns ‘name’ ‘zone’ ‘state’ and ‘lga’. The first non-header row is still a special row and has to start with the word ‘label’ under column ‘name’ then under each additional column you can enter some free-text with the text you want to display to the enumerator for each hierarchy level. The next rows will contain the relationships between the different elements in your hierarchy. State 1 for example has two LGA’s (1 and 2) and belongs to Zone 1. Notice how every element contains ‘choice_label’ under the ‘name’ column. Also notice that if there are 777 LGA’s in the country you will need 777 rows regardless of the number of Zones or States, the lower level in the hierarchy stipulates the size of the sheet.
Excel 'cascades' worksheet
name | zone | state | lga |
label | Choose your zone: | Choose your state: | Choose your lga: |
choice_label | Zone 1 | State 1 | LGA 1 |
choice_label | Zone 1 | State 1 | LGA 2 |
choice_label | Zone 1 | State 2 | LGA 3 |
choice_label | Zone 1 | State 2 | LGA 4 |
choice_label | Zone 2 | State 3 | LGA 5 |
choice_label | Zone 2 | State 3 | LGA 6 |
choice_label | Zone 2 | State 4 | LGA 1 |
choice_label | Zone 2 | State 4 | LGA 2 |
I the survey sheet we will add a new row of type ‘cascading_select’ followed by the name of the column of the lowest level (rightmost column in the Excel ‘cascades’ worksheet) of the hierarchy (in this case ‘lga’). The ‘name’ of this row element is user defined, in this case ‘mylga’. For the content of the ‘label’ column in the Excel ‘survey’ sheet it is important to notice that the ‘cascading_select’ works differently than any of the question types we have been working with in the sense that a ‘cascading_select’ does not correspond to just one question but to as many questions as there are hierarchy levels defined in the Excel ‘cascades’ worksheet. In this case we have three levels so we will have three questions and thus three labels to display on the Android smartphone screen. This might sound confusing but it’s just how the cascading select works. The ‘label’ column on the Excel ‘survey’ sheet is overridden by the ‘label’ row on the Excel ‘cascades’ sheet. If you are confused just try it out and test it. You can then replace this example with your own cascading select.
Excel 'survey' worksheet
type | name |
cascading_select lga | mylga |
This will present the user with three questions (which one shows up totally depends on previous input).
If you want to know the zone, the state, or the LGA of your submission, you can just look for variables called mylga, mylga_state, and mylga_zone in the resulting data files. The “mylga” bit was obtained from the name assigned to the cascading select in the survey sheet, and the “state” and the “zone” respectively from the ‘name’ row in the Excel ‘cascades’ sheet.
The above is a simple test case, and may not seem impressive. But when you’re dealing with 777 LGAs, like Nigeria, the feature can be very effective.
[1] Try it our yourself! Go to the forms page on formhub, search for “cascad” , and clone any of the forms that show up there. The one I have worked above is available here or here. ↩
You can also make questions that show the user an ’image’, a ’video or an ’audio’ file as a way of asking a question. In order to do this, you will need to put the media in /odk/formname-media folder on your phone, and reference them from xlsform in a media column.
There is an easy way to show a logo at the top/beginning of your form. Simply upload an image file in the png format called form_logo.png. You can do this in the Media section of the form settings page.
Formhub allows forms to be filled via Text Messages (SMS) in addition to ODK (Android), Enketo (Web Online Form) and other methods.
SMS are useful for cases where you can’t use the other methods ; either because you don’t have access to the mobile data network (GPRS, EDGE, etc) or because you only have access to low-end phones.
Important: SMS Support is meant as a degraded yet usable way of surveying. If you can use other methods, please do.
SMS Support comes with specificities (and limitations) which must be known prior to designing the form:
Please download the example form with SMS Support to learn by example.
sms_keyword | proposed keyword (first word of SMS) to match this form. Defaults to id_string and can be changed afterwards. |
sms_separator | character used to separate groups. Defaults to “+”. Change to “#” if you want to allow media fields. |
sms_allow_media | Whether to accept media (photo, audio, video) question types. Defaults to False. |
sms_date_format | Format date fields must be sent in. Uses python’s strftime syntax. Defaults to %Y-%m-%d. |
sms_datetime_format | Format datetime fields must be sent in. Uses python’s strftime syntax. Defaults to %Y-%m-%d-%H:%M. |
sms_response | Custom message for successful submission. It can include “{{ id }}” which is the unique ID of that submission. Defaults to “[SUCCESS] Your submission has been accepted. It's ID is {{ id }}.” |
Things to Know
text | Text input. |
integer | Integer (ie, whole number) input. |
decimal | Decimal input. |
select_one [options] | Multiple choice question; only one answer can be selected (the [] notation indicates that you will need to replace [options] by a user defined choice list variable name). |
select_multiple [options] | Multiple choice question; multiple answers can be selected (the [] notation indicates that you will need to replace [options] by a user defined choice list variable name). |
note | Display a note on the screen, takes no input. |
geopoint | Collect GPS coordinates. |
image | Take a photograph. |
barcode | Scan a barcode, requires the barcode scanner app is installed. |
date | Date input. |
datetime | Accepts a date and a time input. |
audio | Take an audio recording. |
video | Take a video recording. |
calculate | Perform a calculation; see “calculates” below. |
Your first stop for more resources should be formhub university, an account on formhub where we have put some example forms that delve deeper into some of the elements we have reviewed so far. You can download the forms to your Android smartphone and inspect them to learn about the most advanced features.
The Formhub community is very active and growing as users around the world start using the system and asking for improvements. We have created a google group where you can ask questions, review other users conversations about the most advanced features and solutions to common problems and of course meet people from all over the world who are working on mobile rapid data-gathering projects. Become a member of formhub-users@googlegroups.com and let us know about your experience with the platform.
Another important resource is our blog, Formhub is an ongoing project and every month we publish a roundup of the improvements made to the system. If you want to learn what’s new this is the go to place.
The following resources will act as important reference once you have gone through the contents of this document:
The XLSform standard document can guide you through the specific input types, column headers, and so on that are legitimate syntax in XLSforms.
If you want to dig in deeper to understand xforms and go beyond xlsforms, here are some resources to understand them:
Form guidelines: http://opendatakit.org/help/form-design/
The javarosa formula language (used in constraint, relevant, and calculation expressions): http://opendatakit.org/help/form-design/binding/ (See section under “operators”)
Sample forms: http://formhub.org/formhub_u http://formhub.org/forms