McKinsey’s recent survey reported that the adoption of AI has rapidly increased globally (See Figure 1). However, the glass is still half-empty since almost half of the world still needs to leverage the power of AI. Show
This gap can be due to the various challenges and barriers businesses face while developing and implementing the technology. Some of those challenges are encountered during the data collection/harvesting phase of developing an AI/ML model. This article explores 4 data collection challenges and ways to overcome them to streamline the AI development and implementation process for business leaders and developers. Figure 1. AI adoption from 2020 to 2021Source: McKinsey1. Selecting the right datasetData can be considered as fuel for an AI/ML model. Determining the dataset is one of the most crucial steps of data collection. One of the key challenges that can occur while determining the dataset is data being myopic, which means it does not cover the full scope of the project and is not aligned with the real-world activities the model will perform. A study by the University of California and Google identified that in the machine learning development community, the majority of the datasets used to train models are reused or borrowed. This creates misalignments in the project objectives and results in an inaccurate finished product. SolutionTo overcome this challenge, the following steps can be taken:
2. Avoiding data biasCollecting biased data can lead to a biased and erroneous AI/ML model and thus should be avoided. For instance, if the dataset used to train a patient referral system does not include male patients or patients with lower income levels, it will provide biased and erroneous outcomes when implemented in a real clinic. This bias can unintentionally be transferred by the data collector into the AI model. SolutionThe following steps can be taken to overcome data bias while harvesting data:
Sponsored Clickworker can help you overcome data collection challenges with their crowdsourcing model. They work with over 4 million registered data collectors who are proficient in 45 languages and over 70 different target markets. Check out this video to get a glimpse of their offerings. 3. Data protection and legal issuesThis section explains some ethical and legal constraints to data collection: Data protectionNot all data is readily and publicly available to use. Some data is sensitive in nature and can not be accessed easily, thus making it challenging to collect. For instance, in order to train a computer vision system for radiology, thousands of medical images are required. This type of data can be expensive to collect and can have various ethical constraints attached. Legal issuesData collection is not as easy as it used to be. As people and government bodies recognize the risks of data exploitation, they make more efforts to regulate data collection and improve data protection. SolutionIn order to avoid these issues, considering the following questions prior to data collection can be helpful:
Answering these questions and clearly explaining them to the participants can make the whole data collection process transparent. It is also important to check the data collection rules from the relevant regulatory body followed in the country in which the data is being collected. 4. Underestimating the costsLarge datasets require a large number of data collectors. In this case, the costs can pose a barrier. For instance, if a company opts for in-house data collection for an ML project, it will have to perform the following tasks:
This process can be unaffordable or even overwhelming for some businesses, thus thwarting the entire process. SolutionThe following considerations can help overcome this challenge:
You can also check our data-driven list of data collection/harvesting services to find the best option that suits your project. If you need to evaluate data collection vendors in the market, you can download our free data collection vendor evaluation guide spreadsheet: Get Data Collection Vendor Selection Guide For more in-depth knowledge on data collection, feel free to download our comprehensive whitepaper: Get Data Collection Whitepaper Further reading
If you need help finding a vendor or have any questions, feel free to contact us: Find the Right Vendors Shehmir Javaid is an industry analyst at AIMultiple. He has a background in logistics and supply chain management research and loves learning about innovative technology and sustainability. He completed his MSc in logistics and operations management from Cardiff University UK and Bachelor's in international business administration From Cardiff Metropolitan University UK. Leave a Reply Comment * 0 CommentsWhat are the potential challenges limitations of the data collection methods?Challenges in current data collection practices. Inconsistent data collection standards. ... . Context of data collection. ... . Data collection is not core to business function. ... . Complexity. ... . Lack of training in data collection. ... . Lack of quality assurance processes. ... . Changes to definitions and policies and maintaining data comparability.. What are common challenges in data collection?What are common challenges in data collection?. Data quality issues. Raw data typically includes errors, inconsistencies and other issues. ... . Finding relevant data. ... . Deciding what data to collect. ... . Dealing with big data. ... . Low response and other research issues.. What are the 3 most commonly used data collection in research?Here are the top 5 data collection methods and examples that we've summarized for you:. Surveys and Questionnaires. ... . Interviews. ... . Observations. ... . Records and Documents. ... . Focus Groups.. What are the disadvantages of data collection?The disadvantages of collecting data through participant observation are (1) costly staff necessary to conduct the research observations; (2) the research can be quite time consuming; (3) the problem of fitting the observer into the setting of research interest unobtrusively and without publicity; (4) potential bias or ...
|