Studying Product Competition Using Representation Learning

Proserpio Davide
Proserpio Davide
Troncoso Isamar
Troncoso Isamar

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020, pp. 1261-1268, 2020.

Cited by: 0|Bibtex|Views36|DOI:https://doi.org/10.1145/3397271.3401041
EI
Other Links: arxiv.org|dl.acm.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
Our paper proposes Product2Vec, a method based on the representation learning algorithm Word2Vec, to understand product-level competition in large markets with millions of products

Abstract:

Studying competition and market structure at the product level instead of brand level can provide firms with insights on cannibalization and product line optimization. However, it is computationally challenging to analyze product-level competition for the millions of products available on e-commerce platforms. We introduce Product2Vec, a ...More

Code:

Data:

Introduction
  • Identifying key competitors is essential to firms’ competitive strategies, such as pricing, product design, and positioning [3, 7, 20].

    competition does not occur only among different brands.
  • Competition occurs among different products within the same brand.
  • The word embeddings are designed to capture semantic similarities between words: words that appear in similar contexts in the corpus of text will be close to each other in the word vector space.
  • The authors treat shopping baskets as sentences and products as words and use representation learning to transform each product into a vector.
  • The authors use the vectors generated by the algorithm to define two measures, complementarity and exchangeability, that allow them to distinguish between products that are complements and substitutes, respectively
Highlights
  • Identifying key competitors is essential to firms’ competitive strategies, such as pricing, product design, and positioning [3, 7, 20].

    competition does not occur only among different brands
  • The word embeddings are designed to capture semantic similarities between words: words that appear in similar contexts in the corpus of text will be close to each other in the word vector space
  • We treat shopping baskets as sentences and products as words and use representation learning to transform each product into a vector
  • We show that products that share common shopping contexts, and that are close in the vector space, are more likely to be either complements or substitutes than those far apart in the vector space
  • We report the estimates of the target model; in column 2, we report the estimates of the FH model; in columns 3 and 4, we report the estimates of choice models that use product vectors obtained from Product2Vec, without and with the complementarity and exchangeability measures; in columns 5 and 6, we report the estimates of choice models that use product vectors obtained from Revised Product2Vec, without and with the complementarity and exchangeability measures
  • When compared with recently developed methods to predict consumer purchases [18], we achieve higher out-of-sample hit rates (14.1% vs. 12.8%)
  • Our paper proposes Product2Vec, a method based on the representation learning algorithm Word2Vec, to understand product-level competition in large markets with millions of products
Results
  • 4.1 Data

    The authors test the model using the IRI’s scanner panel data [5], which contains transaction information of 30 product categories and 13,124 unique products purchased by 5,214 households across 53 stores at the weekly level for 52 weeks.
  • The total number of shopping baskets amounts to 280,052
  • The authors divide this data into three sets: 40% training set, 40% estimation set, and 20% test set.
  • The authors expect that complements are the products with the highest complementarity scores, and substitutes are the products with the highest exchangeability scores, conditional on having low complementarity scores.
  • The authors list the top-3 products with the highest complementarity scores and the top-3 products with the highest exchangeability scores and low complementarity scores, including the values for complementarity and exchangeability for each product pair.
  • The same is true for substitutes: the model successfully recovers pairs of competing products from the same brand or different brands
Conclusion
  • The authors' paper proposes Product2Vec, a method based on the representation learning algorithm Word2Vec, to understand product-level competition in large markets with millions of products.
  • The authors' model allows them to differentiate substitutes and complements using two metrics, exchangeability and complementarity.
  • By combining these vectors with random utility-based choice models, the authors can forecast demand more quickly and accurately.
  • This is important for firms to make precise and timely predictions of future sales.
  • The authors' model can estimate price elasticities more accurately by removing the influence of price from product vectors
Summary
  • Introduction:

    Identifying key competitors is essential to firms’ competitive strategies, such as pricing, product design, and positioning [3, 7, 20].

    competition does not occur only among different brands.
  • Competition occurs among different products within the same brand.
  • The word embeddings are designed to capture semantic similarities between words: words that appear in similar contexts in the corpus of text will be close to each other in the word vector space.
  • The authors treat shopping baskets as sentences and products as words and use representation learning to transform each product into a vector.
  • The authors use the vectors generated by the algorithm to define two measures, complementarity and exchangeability, that allow them to distinguish between products that are complements and substitutes, respectively
  • Objectives:

    The authors' goal is to measure how sensitive consumers are to price changes and perform demand forecasts by combining representation learning and choice models.
  • Results:

    4.1 Data

    The authors test the model using the IRI’s scanner panel data [5], which contains transaction information of 30 product categories and 13,124 unique products purchased by 5,214 households across 53 stores at the weekly level for 52 weeks.
  • The total number of shopping baskets amounts to 280,052
  • The authors divide this data into three sets: 40% training set, 40% estimation set, and 20% test set.
  • The authors expect that complements are the products with the highest complementarity scores, and substitutes are the products with the highest exchangeability scores, conditional on having low complementarity scores.
  • The authors list the top-3 products with the highest complementarity scores and the top-3 products with the highest exchangeability scores and low complementarity scores, including the values for complementarity and exchangeability for each product pair.
  • The same is true for substitutes: the model successfully recovers pairs of competing products from the same brand or different brands
  • Conclusion:

    The authors' paper proposes Product2Vec, a method based on the representation learning algorithm Word2Vec, to understand product-level competition in large markets with millions of products.
  • The authors' model allows them to differentiate substitutes and complements using two metrics, exchangeability and complementarity.
  • By combining these vectors with random utility-based choice models, the authors can forecast demand more quickly and accurately.
  • This is important for firms to make precise and timely predictions of future sales.
  • The authors' model can estimate price elasticities more accurately by removing the influence of price from product vectors
Tables
  • Table1: Examples of Complements and Substitutes Found by Revised Product2Vec
  • Table2: Results of the Mixed Logit Model
  • Table3: Comparing Choice Models with SHOPPER
Download tables as Excel
Reference
  • Rami Al-Rfou, Bryan Perozzi, and Steven Skiena. 2013. Polyglot: Distributed word representations for multilingual nlp. arXiv preprint arXiv:1307.1662 (2013).
    Findings
  • Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of machine learning research 3, Feb (2003), 1137–1155.
    Google ScholarLocate open access versionFindings
  • Mark Bergen and Margaret A Peteraf. 2002. Competitor identification and competitor analysis: a broad-based managerial approach. Managerial and decision economics 23, 4-5 (2002), 157–169.
    Google ScholarLocate open access versionFindings
  • Steven Berry, James Levinsohn, and Ariel Pakes. 1995. Automobile prices in market equilibrium. Econometrica: Journal of the Econometric Society (1995), 841–890.
    Google ScholarLocate open access versionFindings
  • Bart J Bronnenberg, Michael W Kruger, and Carl F Mela. 2008. Database paperâĂŤThe IRI marketing data set. Marketing science 27, 4 (2008), 745–748.
    Google ScholarLocate open access versionFindings
  • Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of machine learning research 12, Aug (2011), 2493–2537.
    Google ScholarLocate open access versionFindings
  • Wayne S DeSarbo, Ajay K Manrai, and Lalita A Manrai. 1993. Non-spatial tree models for the assessment of competitive market structure: an integrated review of the marketing and psychometric literature. Handbooks in operations research and management science 5 (1993), 193–257.
    Google ScholarLocate open access versionFindings
  • Peter S Fader and Bruce GS Hardie. 1996. Modeling consumer choice among SKUs. Journal of marketing Research 33, 4 (1996), 442–452.
    Google ScholarLocate open access versionFindings
  • Sebastian Gabel, Daniel Guhl, and Daniel Klapper. 201P2V-MAP: Mapping Market Structures for Large Retail Assortments. Journal of Marketing Research (2019), 0022243719833631.
    Google ScholarLocate open access versionFindings
  • Peter M Guadagni and John DC Little. 1983. A logit model of brand choice calibrated on scanner data. Marketing science 2, 3 (1983), 203–238.
    Google ScholarLocate open access versionFindings
  • John R Hauser. 2014. Consideration-set heuristics. Journal of Business Research 67, 8 (2014), 1688–1699.
    Google ScholarLocate open access versionFindings
  • Solomon Kullback and Richard A Leibler. 1951. On information and sufficiency. The annals of mathematical statistics 22, 1 (1951), 79–86.
    Google ScholarLocate open access versionFindings
  • Daniel McFadden and Kenneth Train. 2000. Mixed MNL models for discrete response. Journal of applied Econometrics 15, 5 (2000), 447–470.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
    Findings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.
    Google ScholarFindings
  • Andriy Mnih and Koray Kavukcuoglu. 2013. Learning word embeddings efficiently with noise-contrastive estimation. In Advances in neural information processing systems. 2265–2273.
    Google ScholarFindings
  • Amil Petrin and Kenneth Train. 2010. A control function approach to endogeneity in consumer choice models. Journal of marketing research 47, 1 (2010), 3–13.
    Google ScholarLocate open access versionFindings
  • Francisco JR Ruiz, Susan Athey, and David M Blei. 2019. SHOPPER: A probabilistic model of consumer choice with substitutes and complements. Annals of Applied Statistics (2019).
    Google ScholarLocate open access versionFindings
  • Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, 384–394.
    Google ScholarLocate open access versionFindings
  • Glen L Urban, Philip L Johnson, and John R Hauser. 1984. Testing competitive market structures. Marketing Science 3, 2 (1984), 83–112.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments