Importing data into OpenRefine
Overview
Teaching: 10 min
Exercises: 5 minQuestions
How do I get data into OpenRefine?
Objectives
Successfully import data into OpenRefine
Importing data
What kinds of data files can I import?
There are several options for getting your data set into OpenRefine. You can upload or import files in a variety of formats including:
- TSV (tab-separated values)
- CSV (comma-separated values)
- Excel
- JSON (javascript object notation)
- XML
- Google Spreadsheet
Create your first OpenRefine project (using provided data)
To import the data for the exercise below, follow the instructions in Setup to run OpenRefine. The archive specific data we will use for this workshop is information extracted from [a copyright register which contains examples of Victorian photography sent to Stationer’s Hall in London]. The dataset was created by the UK National Archives and is called ‘copy1-data_edit.csv’. NOTE: If OpenRefine does not open in a browser window, open your browser and type the address http://127.0.0.1:3333/ to take you to the OpenRefine interface.
- Once OpenRefine is launched in your browser, click
Create Project
from the left hand menu and selectGet data from This Computer
- Click
Choose Files
(or ‘Browse’, depending on your setup) and locate the file which you have downloaded calledcopy1-data_edit.csv
- Click
Next >>
- the next screen gives you options to ensure the data is imported into OpenRefine correctly. The options vary depending on the type of data you are importing.- Click in the
Character encoding
box and set it toUTF-8
- Ensure the first row isn’t used to create the column headings by unchecking the box
Parse next 1 line(s) as column headers
- Make sure the
Parse cell text into numbers, dates, ...
box is not checked, so OpenRefine doesn’t try to automatically detect numbers- The Project Name box in the upper right corner will default to the title of your imported file. Click in the
Project Name
box to give your project a different name, if desired.- Once you are happy click the
Create Project >>
button at the top right of the screen. This will create the project and open it for you. Projects are saved as you work on them, there is no need to save copies as you go along.
To open an existing project in OpenRefine you can click Open Project
from the main OpenRefine screen (in the left hand menu). When you click this, you will see a list of the existing projects and can click on a project’s name to open it.
Key Points
Use the
Create Project
option to import dataYou can control how data imports using options on the import screen