What are 3 potential problems limitations or challenges to data collected in the community?

McKinsey’s recent survey reported that the adoption of AI has rapidly increased globally (See Figure 1). However, the glass is still half-empty since almost half of the world still needs to leverage the power of AI.

This gap can be due to the various challenges and barriers businesses face while developing and implementing the technology. Some of those challenges are encountered during the data collection/harvesting phase of developing an AI/ML model.

This article explores 4 data collection challenges and ways to overcome them to streamline the AI development and implementation process for business leaders and developers.

Figure 1. AI adoption from 2020 to 2021

What are 3 potential problems limitations or challenges to data collected in the community?
Source: McKinsey

1. Selecting the right dataset

Data can be considered as fuel for an AI/ML model. Determining the dataset is one of the most crucial steps of data collection. One of the key challenges that can occur while determining the dataset is data being myopic, which means it does not cover the full scope of the project and is not aligned with the real-world activities the model will perform.

A study by the University of California and Google identified that in the machine learning development community, the majority of the datasets used to train models are reused or borrowed. This creates misalignments in the project objectives and results in an inaccurate finished product.

Solution

To overcome this challenge, the following steps can be taken:

  • Assign a dedicated team for data collection. A dedicated team will know the project in and out and will be able to choose the right dataset.
  • Ensure that the team understands and knows the objectives and goals of the project.
  • If prepackaged datasets do not cover the scope of the project, then opt for another data collection method that best suits the project.

2. Avoiding data bias

Collecting biased data can lead to a biased and erroneous AI/ML model and thus should be avoided. 

For instance, if the dataset used to train a patient referral system does not include male patients or patients with lower income levels, it will provide biased and erroneous outcomes when implemented in a real clinic. 

This bias can unintentionally be transferred by the data collector into the AI model. 

What are 3 potential problems limitations or challenges to data collected in the community?

Solution

The following steps can be taken to overcome data bias while harvesting data:

  • Ensure that the dataset is comprehensive and all-inclusive. For instance, a quality inspection system must be trained with data on both defective and working items.
  • Ensure that the participants for data collection and revision include people from diverse backgrounds. The dataset must represent the total population on which the AI/ML system will be deployed on.
  • Utilize crowdsourcing to expand the range of the data since it offers fast access to large amounts of human-generated data. Since the data collectors are located in different countries, the datasets are diverse.

Sponsored

Clickworker can help you overcome data collection challenges with their crowdsourcing model. They work with over 4 million registered data collectors who are proficient in 45 languages and over 70 different target markets.

Check out this video to get a glimpse of their offerings. 

This section explains some ethical and legal constraints to data collection:

Data protection

Not all data is readily and publicly available to use. Some data is sensitive in nature and can not be accessed easily, thus making it challenging to collect.

For instance, in order to train a computer vision system for radiology, thousands of medical images are required. This type of data can be expensive to collect and can have various ethical constraints attached.

Data collection is not as easy as it used to be. As people and government bodies recognize the risks of data exploitation, they make more efforts to regulate data collection and improve data protection.

Solution

In order to avoid these issues, considering the following questions prior to data collection can be helpful:

  • What data will be collected?
    • To answer this, you need to check what type of data is required. For instance, is it biometrics data, such as face images of people, voice data, thumbprint scans, etc? This can help clarify the type of ethical and legal factors to consider
  • How should legal stipulations (related to the collection of the dataset) be mitigated?
    • To answer this question, you need to study the country-specific regulations regarding data collection.
  • How will the data be collected?
    • Different data collection methods have different legal considerations attached to them. For instance, there are certain rules regarding web scraping in different countries.
  • How will the data be stored?
    • Since cyber threats are rising, it is important to consider where the data will be most safe. Will the cloud be more efficient, or physical hard drives will be safer?
  • How will the data be used?
    • To answer this question you need to understand how the data will be used. Who within the organization will have access to the collected data and communicate this information to the data provider.

Answering these questions and clearly explaining them to the participants can make the whole data collection process transparent. It is also important to check the data collection rules from the relevant regulatory body followed in the country in which the data is being collected. 

4. Underestimating the costs

Large datasets require a large number of data collectors. In this case, the costs can pose a barrier. For instance, if a company opts for in-house data collection for an ML project, it will have to perform the following tasks:

  • Hire a dedicated team of data collectors
  • Ensure the level of diversity and skillsets match the requirements of the project
  • Go through onboarding and training for the data collectors
  • Acquire all relevant resources for data collection
  • Track and manage the progress of data collection tasks from all participants

This process can be unaffordable or even overwhelming for some businesses, thus thwarting the entire process.

Solution

The following considerations can help overcome this challenge:

  • Consider data collection costs during the planning phase of the AI/ML development project
  • If the costs cannot be adjusted in the budget, consider outsourcing the operation
  • Use prepackaged datasets if the project does not require highly personalized data. These are relatively cheaper to purchase.

You can also check our data-driven list of data collection/harvesting services to find the best option that suits your project. If you need to evaluate data collection vendors in the market, you can download our free data collection vendor evaluation guide spreadsheet:

Get Data Collection Vendor Selection Guide

For more in-depth knowledge on data collection, feel free to download our comprehensive whitepaper:

Get Data Collection Whitepaper

Further reading

  • Top 6 Data Collection Best Practices
  • AI Data Collection: Quick Guide, Challenges & Top 4 Methods
  • Data Collection Automation: Pros, Cons, & 3 Methods

If you need help finding a vendor or have any questions, feel free to contact us:

Find the Right Vendors

Shehmir Javaid is an industry analyst at AIMultiple. He has a background in logistics and supply chain management research and loves learning about innovative technology and sustainability. He completed his MSc in logistics and operations management from Cardiff University UK and Bachelor's in international business administration From Cardiff Metropolitan University UK.

Leave a Reply
YOUR EMAIL ADDRESS WILL NOT BE PUBLISHED. REQUIRED FIELDS ARE MARKED *

Comment *

0 Comments

What are the potential challenges limitations of the data collection methods?

Challenges in current data collection practices.
Inconsistent data collection standards. ... .
Context of data collection. ... .
Data collection is not core to business function. ... .
Complexity. ... .
Lack of training in data collection. ... .
Lack of quality assurance processes. ... .
Changes to definitions and policies and maintaining data comparability..

What are common challenges in data collection?

What are common challenges in data collection?.
Data quality issues. Raw data typically includes errors, inconsistencies and other issues. ... .
Finding relevant data. ... .
Deciding what data to collect. ... .
Dealing with big data. ... .
Low response and other research issues..

What are the 3 most commonly used data collection in research?

Here are the top 5 data collection methods and examples that we've summarized for you:.
Surveys and Questionnaires. ... .
Interviews. ... .
Observations. ... .
Records and Documents. ... .
Focus Groups..

What are the disadvantages of data collection?

The disadvantages of collecting data through participant observation are (1) costly staff necessary to conduct the research observations; (2) the research can be quite time consuming; (3) the problem of fitting the observer into the setting of research interest unobtrusively and without publicity; (4) potential bias or ...