Recapping the 7th International Conference on Big Data

Global Challenges and the Importance of Relevant and Timely Data

SDGCounting
SDG Counting

--

Global Economic Recovery — Food Security — Access to Relevant Data

These were the main themes driving the conversation in Yogyakarta, Indonesia at the UN’s International Conference on Big Data. This event, hosted by the Big Data Division of UNSTATS, offered participants diverse opportunities ranging from expert dialogs, discussions of emerging tools and approaches, hands-on workshops and a Hackathon to drive innovation.

The five-day conference worked towards four objectives:

  • Identifying global challenges related on official statistics provision and elaborating the importance of relevant and timely data to support policy decisions
  • Employing statistics, methods and statistical analysis using data science
  • Employing big data as a modern data source for Official Statistics
  • Disseminating the latest research on the data science

In this post, we provide summaries of all sessions made available from the conference including access to nearly 60 hours of video. Be sure to also check out the official conference website for more information:

Session Overviews

The Big Data Conference took place over five days and offered multiple concurrent sessions. Below you can access video of all the sessions and read our summaries.

Monday, 07 November, Plenary Sessions

  • Opening Session
  • Sustainability and Global Economic Recovery
  • Food Security
  • Access to Relevant Data

Tuesday, 08 November, Plenary Sessions

  • 10 Years of Data Innovation: What is the Next Normal?
  • UN Big Data Hackathon and UN PET Lab Competition
  • Plenary Sessions: Capacity Development and Regional Hubs, Big Data and the SDGs, Mobile Phone Data for Official Statistics, Data Access and Data Science

Wednesday, 09 November, Room A

  • UN PET Lab
  • Earth Observation Data

Wednesday, 09 November, Room B

  • AIS Data

Thursday, 10 November, Room A

  • Earth Observation Data
  • Mobile Phone Data

Thursday, 10 November, Room B

  • Data Access and Machine Learning
  • Machine Learning

Friday, 11 November, Room A

  • Mobile Phone Data

Friday, 11 November, Room B and Plenary Sessions

  • Machine Learning
  • Closing Session

Note: In keeping with the topic of working with large data sources, the SDGCounting Team has utilized natural language processing tools to help summarize these sessions.

Day 1 (Plenary)

Session Agenda

  • Opening Session
  • Sustainability and Global Economic Recovery
  • Food Security
  • Access to Relevant Data

Key Words

data, policymakers, big data, economic statistics, official statistics, women, policy, challenges, inflation, food security, data, access, information security.

Outline

  • Welcome
  • Introducing the keynote speaker.
  • Young statisticians from all over the world taking part in the hackathon.
  • Video
  • Big Data, Big Data, and Big Data.
  • Dr. Hamza Ali Malik.
  • GDP per capita income and inequality since 1980.
  • The global economy is heading towards further economic slowdown in 2020.
  • The impact of fiscal measures on the Asia Pacific economy.
  • Fiscal measures and multilateral approaches to create fiscal space.
  • The importance of data and how policymakers can use it.
  • Why we’re not doing enough to protect ecosystems.
  • The role of government in bridging the gap between the private sector and the public sector.
  • How do you preserve the diversity of approaches while having a harmonized approach?
  • The second high-level panel is about food security.
  • What is the role of big data in informing food security policy?
  • What we need is to be able to detect and detect inequality while recognizing that we live in a diverse world.
  • What are some of the key challenges that are still to be solved?
  • Why we need to rethink the ethics of dealing with data next.
  • What is the next agriculture census in Indonesia?
  • How does food security relate to the international level?
  • How is the solution to be able to monitor more dynamic the food security situation in a more dynamic manner?
  • The need for more data on food insecurity in Indonesia.
  • Why the central government decided to create the data center inside the Federal Administration.
  • What is the role of an official statistician in the future?
  • The key step in any data integration journey is access to relevant data.
  • What are the challenges? What are the hurdles? What is the best approach?
  • Communicating the value of data to the public.
  • The UN has been in the center of a lot of the attention of the privacy enhancing technology task team.
  • What are the risks associated with bringing all of the data together?
  • Closing

Day 2 (Plenary)

Session Agenda

  • 10 Years of Data Innovation: What is the Next Normal?
  • UN Big Data Hackathon and UN PET Lab Competition
  • Plenary Sessions: Capacity Development and Regional Hubs, Big Data and the SDGs, Mobile Phone Data for Official Statistics, Data Access and Data Science

Key Words

data, task, statistics, regional hubs, science, official statistics, team, hackathon, big data, sdgs, challenges, statisticians, official statistics, mpd, quality, private sector, methodology, mobile phone, access, ict.

Outline

  • Introduction
  • What the task teams have been up to in 2019.
  • The United Nations Global Platform for Big Data (UNAIDS) was officially settled.
  • What’s happening in the Asia Pacific?
  • What’s happening on the Global Platform?
  • What is the role of regional hubs? What is the need for a more sustained and sustainable form of collaboration?
  • The importance of building networks and getting the views of other organizations.
  • Who should be responsible for making the connection with regional hubs?
  • How do we strengthen our collaboration with the geospatial community?
  • What are the benefits of having a global digital twin?
  • What are the priorities from different countries and how can we respond to that?
  • UNICEF’s global geospatial information management group (GIS) has a completely different community than the GIS division.
  • What’s the purpose of the Data Science Leading Network?
  • What is the purpose of the bureau? What are the key functions and scope? What is the scope?
  • How can we make sure that all these fear people are using data science?
  • What are the burning issues for a data science group within a statistical office?
  • What is the role of data science in government?
  • Official statistics in big data is a huge topic.
  • The importance of getting better access to data outside of your community.
  • Who needs an accreditation for such a solution?
  • What is the sensitivity of health data and how can it be protected?
  • How the PTT lab will help the data producer understand privacy.
  • Do we want to have a Carter declaration? Or not?
  • What should be in the Yogyakarta declaration?
  • How to further discuss the creation of a data science hub in Asia.
  • Can new technologies and data be used to help the UKraine statistics?
  • The hackathon is a big deal.
  • What’s happening around the world in the statistics community right now.
  • Un Big Data Hackathon: Brazil Hub.
  • Start of the hackathon.
  • What is the approach to the African Hub?
  • The importance of involving the youth within the ecosystem.
  • What were the key themes that emerged in the data gathering stage?
  • The first starting point for collaboration is to show the importance of the subject.
  • How are we sharing? How are we learning?
  • How are you engaging with the academia?
  • What are some of the challenges that you might want to think about when setting up big data projects?
  • What can we learn from the successful application?
  • Indonesia is the top country in the Asia-Pacific region with the most data available for the S&P/IPS.
  • Why we need another source of data, not just MPD.
  • Ministry of Digital Development and Communication (MDG).
  • The use of scan data for rich price statistics.
  • Do you have any questions or comments or would you like to present your experiences?
  • What is the cost of using these data sources as observation?
  • What are the key resources that you needed to implement these projects?
  • The basics of MPD data.
  • What country pays during the introduction of the big data for statistical purposes?
  • How do you deal with privacy, confidentiality, and competitive concerns?
  • What is the role of academia in advancing the use of MPD in NSOs?
  • How these challenges can be addressed as a methodology.
  • What are the free tools that NSAA can use to explore mobile phone data?
  • National coordination in terms of data sharing.
  • When do you choose whether to get the data from the telephone providers or do the analysis ourselves?
  • The panel discussion about data access and data science.
  • How our data has been used in other countries.
  • How do you identify the seven steps from acquisition to ingestion of data?
  • Challenges that resulted from the project.
  • What is the importance of quality?
  • Quality assurance of the data.
  • Is it time to rethink the role of official statistics?

Day 3 (Room A)

Session Agenda

  • UN PET Lab
  • Earth Observation Data

Key Words

data, ai, code, indicators, access, statistics, information, official statistics, shipping, questions, methods, visualization, clusters, process, algorithm.

Outline

  • Introduction to the workshop.
  • How much data do you have?
  • Australia has a project to look at the report congestions and also the actual destination of the chips.
  • What is the coverage of the data source in terms of type of shape?
  • What will you do with the data if you link it to other information?
  • What is AAS data and how does it work?
  • How to connect to the D365 server.
  • Jupiter Lab Dashboard.
  • How to extract data in Spark DataFrame.
  • How to filter out your results.
  • Why it’s important to make the best statistics available.
  • The importance of having a geospatial expert in your team.
  • Where did the geometries come from and how did they come from?
  • Method #1: Boundary crossing method.
  • What is a keyring? How does it work?
  • The importance of benchmarking against the official statistics.
  • How do you envision the final step of integrating these data into the official statistics?
  • Demonstration #1: Writing data into an S3 bucket with Python.
  • Fetching data from the FITCH.
  • How to test your data in development.
  • What are we going to do with this data?
  • What is the data that comes out of it?
  • The heat map of Odessa.
  • What are the pros and cons of each clustering algorithm?
  • Using H3 indices to reduce the complexity of clustering using big data.
  • How to identify the minimum number of points in your model.
  • Caveats and limitations of the method.
  • How about stability? How do you make sure your statistics are still applicable?
  • How can we expand the UN Global platform?
  • The importance of having an outline of the problem to start with.

Day 3 (Room B)

Session Agenda

  • AIS Data

Key Words

data, differential privacy, dataset, statistics, noise, unhcr, federated, workshop, enclaves, epsilon, map, fao, images, geospatial, official statistics, observation, statistics, accuracy, monitoring, method, earth observation data.

Outline

  • Introduction to today’s program.
  • What is the role of official statistics in the future?
  • Are there ways to make this automated and to have controls and guarantees in place so that the information you give out cannot be reverse engineered?
  • What is a secure enclave and how does it work?
  • Why federated learning is not reproducible.
  • Do you report the amount of noise that’s being added on?
  • Approach in practice.
  • What does the use of privacy enhancing technologies mean from a legal perspective?
  • How are these technologies already being used?
  • What are the use cases for privacy enhancing technologies at NSO?
  • What are the three types of data ecosystems?
  • What are the three generic use cases that Jack presents?
  • What is federated learning and how does it work?
  • Introduction of data-curation at UNHCR.
  • What’s the best way to show data?
  • What is Smart Noise? How does it work?
  • How the data was prepared for the dataathon.
  • Difficulties that everyone would face with a hackathon.
  • What does the formula look like?
  • How to use SQL to make better predictions.
  • Presenters for the workshop: Lorenza De Simone from FAO.
  • Satellite data can be used for a number of things, including mapping of informal settlements.
  • What are the applications of the FAO data?
  • How many images are we using? How many temporary composites are we reusing? Is it a monthly or bi-monthly compilation?
  • The golden standard to increase the highest accuracy possible is using the GPS to reference the borders of the parcels.
  • What happens when you have a satellite image every 16 days?
  • What’s the difference between single data in time series vs. time series data?
  • What is the relevance of land cover and crop type maps to the integrated geospatial information framework?
  • What are the sources of land cover maps? What are the problems?
  • What are some of the main limitations of the current maps?
  • How the field data is used for supervised and unsupervised approaches.
  • How to use K-means clustering to artificially activate and in situ data set.
  • What are the advantages of this solution?
  • The results of the study.
  • What is the difference between pixel-based and object-based models?
  • Is it feasible to use satellite imagery to estimate land cover and also in terms of the different crops that we have?
  • What is the program of Digital Africa?
  • Lorenzo gives us a taste of digital earth africa with some very concrete examples.

Day 4 (Room A)

Session Agenda

  • Earth Observation Data
  • Mobile Phone Data

Key Words

data, survey, plot, crop, agriculture, estimates, agricultural, boundaries, courses, area, case, training, malawi, imagery, model, earth observation data, satellite.

Outline

  • Introduction of today’s speakers.
  • Today’s topic: The role that surveys can play in calibrating and validating light-based methods for high-resolution crop area mapping
  • Satellites are now able to observe agricultural plots in smallholder farming systems with clarity that we could not have imagined just a few years ago.
  • What are the main outputs of this research?
  • Survey data and machine learning for the future.
  • The second survey from Malawi is the fifth integrated household survey which is implemented in parallel with the ISPS slightly on a longer field work duration.
  • What are the geospatial predictors that we’re using?
  • Results from the training and validation phases.
  • Centroid method outperforms single-point method.
  • What’s next for the research.
  • Looking at the sensitivity of our findings in accordance with the resolution of the satellite imagery.
  • Is there a way to combine cartography efforts with means to create a training data for satellites or earth observation imagery?
  • Can an issue do their own yield estimate? Can it be independent?
  • The responsibility for doing work on agricultural statistics doesn’t always lie with the Nsr.
  • Building geospatial training courses based on user profiles.
  • The training approach used to develop a personalized training program.
  • What are the courses that are going to be involved?
  • What level of knowledge do you want to upskill to?

Day 4 (Room B)

Session Agenda

  • Data Access and Machine Learning
  • Machine Learning

Key Words

data, official statistics, website, statistics, machine learning, business, netherlands, companies,internet, poverty, model, indicator, organization, official statistics, regional hub, satellites.

Outline

  • Introducing the presenters.
  • Welcome to the conference and what to expect.
  • Google’s role in the study and how it compares to traditional statistics.
  • The three categories of the internet economy.
  • Online services as a focal point of the report.
  • The ministry of economic affairs asked us to do something for them.
  • Facebook’s “Covid” survey.
  • What’s the worry that we’ve lost 100 years of data?
  • What you can do with the data you get.
  • How different industries use the data differently.
  • An example of how the pandemic changes business behavior online.
  • Is any data better than no data?
  • Validating the results of the data.
  • How do we maintain the fundamental principles of statistics when we’re using big data?
  • How to get started with the site.
  • What are the different filters you can apply?
  • What is the most widely accepted payment method in Brazilian online stores?
  • How many business sites in Indonesia refer to LinkedIn?
  • How many e-commerce sites were added in Germany in 2022?
  • What’s going on with the whole South Africa on-control?
  • What machine learning and big data mean for your organization.
  • What are the potential use cases of big data for official statistics?
  • What have you done so far in partnership with the Un?
  • What is machine learning and how does it work?
  • Why do we need machine learning for office statistics?
  • What are available tools that are free to machine learning in NSOs?
  • How can we get more access to data?
  • What does machine learning mean for official statistics?
  • Is it okay to expect the same limitation in the public sector as in the private sector?
  • What is the biggest problem for the machine learning community right now?
  • Poverty in Indonesia and the use of satellite imagery.
  • How the framework is implemented in Indonesia.
  • What are the pros and cons of using satellite imagery for poverty mapping?
  • What else we can do with the poverty map dashboard.
  • What are some of the questions that need to be evaluated?
  • What is the possible country-to-country error in the data?
  • What are the components of the insights and forecasts platform?
  • How does the forecasting model compare against a traditional econometric model?
  • What is machine learning and how can it be implemented in this space?
  • What’s the progress you’re making in with your data science area in the Inquity?
  • The readiness of the NSO to implement the big data as well as machine learning.

Day 5 (Room A)

Session Agenda

  • Mobile Phone Data

Key Words

data, people, mpd, telco, location, mobile phone, regional hub, countries.

Outline

  • Introduction to the session.
  • The process of flow in the Netherlands.
  • The second step is to check if the results of the algorithm are valid.
  • The first 100-by-100-meter grid.
  • Is there any possibility of using GPS signal location individually as opposed to going to the location of the phone.
  • How do you know if you have the right data?
  • What are the main ingredients of MPD mobile posting data?
  • How does the partnership work between BPS and Macell?
  • Do you mainly use cell tower data or use transparence proxies that are used to compress the data?
  • Do you provide the information regarding how you do the data cleaning of cell tower-based data?
  • What is the role of regulators and the ministries in this regard?
  • How do you make sure that when you reveal the data it does not harm the data?
  • How to be adaptable in partnership with the telecom operators.
  • What are the plans for the MPD in Brazil?
  • What’s happening in the MPD.
  • Some of the NSOs would like to share their concerns or plans for using MPD.
  • The class hasn’t been over yet.
  • Summary of the workshop.

Day 5 (Room B and Plenary)

Session Agenda

  • Machine Learning
  • Closing Session

Key Words

data, regional hub, model, conference, official statistics, mapping, hackathon, participants, machine learning, deep learning, remote sensing.

Outline

  • Introduction to today’s session.
  • The second run of the Machine Learning for Official Statistics Course will be available to everyone soon.
  • Deep learning is a data-driving algorithm.
  • The cloud and shadow detection method.
  • The case of an early rise mapping task in China.
  • How to detect the cloud and remove the cloud from satellite images.
  • The importance of fine-tuning and transfer learning in remote sensing models.
  • What is the purpose of this small group discussion?
  • The four ideas that we have at the moment.
  • Introduction to the rest of the world.
  • What’s the next step?
  • Opening and closing remarks from.
  • The importance of relevant and timely data.
  • Thank you to everyone who helped make this hackathon happen.
  • Big Data for Official Statistics.
  • Close of the ceremony.

SDGCounting is a program of StartingUpGood and tracks the progress of counting and measuring the success of the SDGs. Check us out on Twitter.

--

--

SDGCounting
SDG Counting

Keeping track of progress on trying to count and measure the success of the Sustainable Development Goals.