Wild guessing is over

Thinking Of Mischief

Through the years, the charts and sales community have been accessing more and more information. Areas of sales that remained black boxes for long are now kind of familiar, for example music clubs and sales in some Latin American and Asian markets. A couple of months ago, I presented a new way to accurately gauge success with the help of YouTube Insights, providing a method to put numbers based on factual information rather than wild guessing. Today, I’m happy to say that the wild guessing era is over. A massively powerful tool fills in every data hole that remained up to now. Here is how to move from subjective estimations to accurate systematic processes.

To best understand the true strength of the method I’m explaining today, it’s key to have a clear view of how the estimation task works. So far, we have been using 4 distinct layers to estimate sales of a record, each one bringing a new level of accuracy. Today, we are adding a 5th perspective that will amaze you. Our target isn’t to value sales of albums and markets that are already known. For them, we already have certifications, Soundscan figures, charts, etc. Our goal is to accurately value sales of Madonna’s True Blue in Bulgaria, BeatlesA Taste of Honey in Russia or Elvis Presley’s uncharted and uncertified records. Yes, we can do that and that doesn’t involve any bias whatsoever.

Estimation methods

Accuracy level 1 – Wild guessing

Estimates that I’m defining as wild guessing are the ones which involve no factual information. More often than not, they are the result of very basic knowledge and a lot of wishful thinking. Obviously, the accuracy of this method is fairly low. Each of the 4 layers that we will explain enables us to get rid of this method for virtually every record that ever came out. This includes the ones for which we have no chart and sales data at all.

Accuracy level 2 – Acknowledging the market share

The most basic factual data one can use is the market size. As simple as it seems, this already requires some knowledge. The market size of a country changes every year, so does its share among global sales. Then, an album sells copies throughout many years, so there is no perfect share we can use. If one wants to know how many copies Pink Floyd’s Dark Side of the Moon sold in Asia minus Japan, he can expect 24% of its sales to come from there and so assume 10 millionish units.

While this figure includes less bias than wild guessing, it’s still a very weak way of estimating sales. I used the case of the Asian continent and the group Pink Floyd on purpose. Various Asian markets, including China and India, are massively dominated by local records. Then, even when factoring this in, Pink Floyd are relatively weaker there than in other parts of the World. That’s why we reach a value of 10 million while the truth is 20+ times lower. Clearly, we can’t be satisfied if we stick to this method.

Accuracy level 3 – Acknowledging the artist pattern

The issue with the market constant method is that it doesn’t take into account the popularity of the artist you are working on. Even if P!nk did much better than Avril Lavigne globally, nobody seriously expects the former to have outsold the latter in Asia. The question is, why do we expect the Canadian star to do better? It’s because of the artist pattern.

In the past, various lists of awards leaked about both artists, evidencing heavy success in Asia for Lavigne and a modest one for P!nk. Rather than going with 10%, we can then go with maybe 15% and 5%, respectively. I’m not putting realistic numbers on purpose. What’s important is to understand the logic, not the figure. In order to set up valid shares of sales coming from this continent, one would need to check the real percentage for records with known certifications and then use that pattern on remaining albums for which no data is available. We are getting better but there is still a lot of room for improvement.

Accuracy level 4 – Acknowledging the songs’ popularity

At the moment, we consider the market size and who did well or not on each of them. An artist can flop in a country where he is popular though, and vice versa. The percentage of his global sales coming from a specific country will vary from one album to the next. Our flaw here is that we do not consider performance indicators related to the era we are looking at.

We solved this issue thanks to the YouTube Insights tool. It tells us which songs are the most popular in each market. Therefore, when an artists over- or underperforms their standards in a market, we can adjust the estimate accordingly. If an album has sold 5% of its global units in Latin America and registered 10% of its YouTube views there and singles of the follow-up enjoyed 8% of their global views on it, we can expect the follow-up to have sold 4% of its copies in the region.

With this, we went from wild guessing to accurately knowing which era has been popular in each country. Thanks to the patterns of the market and the artist, this gives us a very good idea of sales managed from most records.

There is still the case of artists with compilations and live albums. A popular single can sell a studio album, but also a greatest hits set. If we are factoring in the size of the market, the strength of the artist and the popularity of the content of each album, we are still missing data about the record itself. Well, we were still missing it…

Accuracy level 5 – Factoring in record-specific indicators

Discogs is our savior. If you have never been on that site, it’s probably because you started to get into music when the digital era had already started. Discogs is a crowdsourced database of music releases. For example, currently 3,205 distinct releases of Elvis Presley are listed. Then, the 1958 compilation Golden Records alone has 197 different versions listed, with the year and the area of release for most of them.

While this is already a superb place to view the discography of each artist, including the ones with plenty of minor compilations, there is much more to extract from the site. In fact, for every version of an album, we can see how many users of the website own it. A few years ago, that data wasn’t necessarily relevant. With 15 million unique monthly visitors though, the database is currently a real treasure.

If you ever wondered how many units an album sold in a specific music club or in a specific country, you will now be able to replace that black box with organic, detailed and easy to use figures. The following pair of pictures represents how reliable (green) or not (black) were estimations depending on the market concerned and the type of release. The first one illustrates the situation without Discogs while the second shows how much better that gets with it.

Without Discogs
With Discogs

Why is Discogs data so relevant?


Organic data

There are zillions of numbers here and there about everything and Media no doubt know how to use the most misleading of them to reach biased conclusions. The story isn’t different in the music charts and sales world. For long, I have been searching for every possible indicator that is organic, meaning it isn’t artificially corrupted by promotion. A thousand streams randomly achieved on Spotify represent a much lower financial gross than 1,000 box sets sold to fans of an artist, but they are way, way more relevant to illustrate success, precisely because they are organic. The same is true for YouTube views or catalog sales of products that aren’t promoted for long. Organic indicators are powerful because they highlight a consistent pattern rather than temporary hype created by an artificial tool.

The number of Discogs users who own each record is an organic indicator. As Discogs is also a market place, users won’t only put their Beatles albums visible by all to look good. Even the most obscure LP can be sold. On top of that, these users love to claim a huge collection so they enter all records they own, making it a legitimate summary of their former purchases.

Known population of users

That’s the second element that makes this data so beautiful. The population of Discogs is known. Most people using that site are LP collectors, so they are hardly representative of the entire population. It doesn’t matter though since we know about it.

For example, we know that Michael Jackson‘s Thriller outsold Pink Floyd‘s Dark Side of the Moon by about 3 to 2. On Discogs, they are owned by 105,523 and 162,236 users respectively, giving a 3 to 2 edge to the latter. We can just throw away Discogs owners statistics and say they aren’t accurate. We can also understand that Dark Side is simply more collected while Thriller touched a wider public outside of music fans, and set up a relevant factor to still use that data.

Figures written above suggest Pink Floyd‘s classic album is collected 2.3 times more on Discogs. Of course, we already know a lot about sales of these two records so we can doubt that site will help us. If we speak about Mexico though, while we know that Thriller (without the 25th Anniversary edition) shipped 1,6 million thanks to Amprofon certification, we have zero data about Dark Side. Their Mexican editions are owned respectively by 414 and 205 users. Since Thriller owners are 2.3 times more relevant, that creates a 4.65 to 1 gap in sales between both.

This suggests 345,000 units sold by Dark Side in Mexico. Madonna‘s True Blue has a Discogs owners versus global sales rate about 10% higher than Thriller. With 149 owners in Mexico, we can expect about 635,000 sales from there. The same method gives nearly 200,000 sales for her debut album and 395,000 for Like A Virgin.

Just like that, we went from knowing nothing about these albums on that market to having a precise, realistic, and unbiased number for them. Incredible, isn’t it? It may seem too easy, but the fact is these figures are spot on. Based on previously known data extrapolated from remaining methods, True Blue, Madonna and Like A Virgin were estimated on 650,000, 175,000 and 350,000 respectively. That’s no coincidence, while this method isn’t official, it’s just organically representative of true sales.

Massive coverage

As of now, Discogs covers 5,3 million artists responsible for 10,2 million records themselves representing over 45 million versions. The coverage of this database is unbeatable. Elvis Presley has about 100 distinct albums that got certified in at least one country and about 3 times as much that charted somewhere. With Discogs, we get data about 2,134 albums from him. This database provides information for plenty of records that are still entirely ignored by charts and sales experts.

I pointed out the example of Madonna in a relatively large market. If one wants to get information about Terry Jacks‘ Seasons In The Sun in Colombia though, that’s fully possible too. Obviously, local acts are also available, as well as acts who pre-date the rock era.

Versions listed per market

More often than not, reports about album sales are global reports. We get no data about specific countries, or at best about the markets we already know about like the US and the UK. The most successful albums ever usually have versions from more than 50 different markets on Discogs database. This can include countries as difficult to value as Guatemala and Zimbabwe.

That’s the wonderful part of Discogs: while the Master release of each record, which encapsulates owners of all versions, provides comprehensive statistics for the album, each sub-version also displays how many owners it has so it is possible to come up with a realistic estimation for all of them, from each market.

Versions listed per format and label

To see an album with 50 or more versions from the US alone may sound annoying. For sure it implies more work to collect the data. The truth is that the separation of all of them helps us tremendously.

It has been more than a decade since chart experts started to look for albums issued or not on each music club, most notably BMG Music Club and Columbia House. Myself, I listed the 1997 database of the latter way back in 2007. Discogs has it all. If we look at Madonna‘s Like A Prayer, we know it moved 244,000 million units at BMG Music Club and that it was released in Columbia House as well. There are various versions from both clubs at Discogs. The main LP version from Columbia is here, while the CD version from BMG is available on this link. At the former club, the album has 390 LP owners, 343 CD owners and 18 cassettes owners. At the latter, it has 192 LP owners, 88 CD owners and 16 cassettes owners. That would suggest over 600,000 sales at Columbia House.

Interestingly, in the case of The Immaculate Collection versions from each club add for 307 and 318 owners, respectively. Should we assume the album also did about 1,4 million at Columbia House since it sold 1,46 million at BMG? Not exactly. Discogs has much more users collecting the LP format and sales at that Club were higher at the start of the 90s when they were still strong. This album has 3 times more LP owners of the Columbia version and 3 times more CD owners of the BMG issue, which corrupts the conversion rate for this album. We can still factor these figures appropriately.

There have been music clubs outside of the US too. We tend to completely ignore them quite simply because we had nothing to track them. Discogs includes various Club versions for albums from Canada, Germany, Japan, etc.

What are the limits of the database?

There really is only One Way

Still not 100% complete

Of course, we can’t ask it all from one day to another. While Discogs is far and away the most comprehensive database of music releases online there is still minor versions / records to add. The current database goes already so deep into discographies though that missing records would no doubt be fairly low sellers. The comprehensiveness of the database is also exploding at the moment. Since I wrote down owners of Dark Side of the Moon yesterday, there is nearly 500 more now. One may think this is a problem, but since all sales figures are extrapolated from relative owners between records there is no flaw at all. The most people enter their collection on Discogs, the best it is for charts experts.

The format gap

As previously mentioned, LPs are much more collected than CDs. For that reason, 10,000 owners of a CD represent more sales than 10,000 owners of a LP. Similarly, Picture Discs are quite purchased while cassettes aren’t. Earlier in this article, I have compared Thriller with Dark Side of the Moon and True Blue, to select albums from the LP era. Comparing it with, say, Mariah Carey‘s Music Box, would make no sense.

The country gap

The usage of Discogs isn’t the same everywhere. For example, it isn’t much used in Japan, a bit more in the US and much more in Europe. For a specific album, we can’t convert German owners and US owners with the same rate while users are relatively speaking much more numerous from the former country. To avoid this issue, we need to compare records that are a good gauge, German ones with German ones, etc.

The album’s aura gap

Bob Marley‘s Legend has 44,156 owners against 23,674 for his studio album Exodus. The first is a compilation targeting the masses. The second was named album of the century by the Time magazine and is an absolute must have for Reggae fans. Also, most Exodus buyers purchased the LP while Legend sold most of its copies since 1991, in CD. That’s why the ratio of owners of less than 2 to 1 isn’t representative of the more than 4 to 1 gap (35 million to 8 million) in real sales. In the same way, from the discography of Elvis Presley, the Sun recordings, his first two albums and his box sets are highly favored by collectors in opposition to love themed compilations.

Here too avoiding the problem is easy. It only takes to use comparable albums to scale sales of releases. A good scale for Exodus would be his remaining albums, Kaya, Uprising, etc, or remaining popular albums from late 70s from the likes Supertramp or Police. On its side, Legend is more comparable to massive 90s compilations like ABBA’s Gold and Madonna‘s The Immaculate Collection.

As you likely understood by now, when using Discogs data to get an accurate gauge of sales the basic is to use comparable records in both the type, the format and the profile of the record. Once you keep this in mind, it’s a highway for plenty of priceless results.

How to use Discogs information?


The usage of the Discogs data is pretty straightforward. We only need to read the number of persons who own a record at the right of the screen inside the Statistics section. To avoid confusion, there is a pair of things to know yet.

Master vs. Release

Most international albums have multiple Releases on Discogs. For example, an album that got issued in LP, CD and Cassette in the US and Canada will have 6 such releases. All of them will have a specific number of owners. Below screenshot shows the number of releases for the early ABBA studio albums.

To properly list these albums, merging all versions together, a Master is defined. The master has a number of owners that is the cumulative tally of owners of all its versions. The master version will be typically the first that was entered inside Discogs’ database, so it can be a very minor one. We need to be cautious on search screens because masters can pop up in the middle of proper versions. For example, if we search for the most owned album in Mexico, Deep Purple‘s  Now What?! comes on top in spite of only 10 owners from versions of that country. It’s down to the fact that it’s Mexican CD was the first entered, so it’s master is located there only.

There is many ways to not mix masters with releases though. First, they are notified inside the URL itself:

Search screens also clearly distinguish both types of releases:

In addition, it’s possible to filter for main reference (master) or releases both from links at the top of the screen and by manipulating the URL. Links’ parameters are listed at the end of the article.

Manufactured vs. sold

Most main markets have their own production lines. It isn’t the case for smaller markets. In the past, even relatively decent markets still relied a lot on imports from elsewhere. While digging into Elvis Presley‘s statistics, I saw German versions of many early albums owned by a lot of users. That’s down in part to Germany being the country that uses the most Discogs in proportion to its market among the top 5 global markets. It’s also due to the fact that in the 50s and the 60s the country was producing and shipping copies of most albums to Austria, Switzerland, Benelux and markets from Eastern Europe.

There is also units produced in a country exclusively to be sold elsewhere. One of the most owned Canadian version of a Presley‘s album is the Pickwick 1975 issue of the Christmas Album that was sold in reality in the US.

Sorting searches by most collected items

The advanced search of Discogs enables to filter on many parameters. Then, the results screen includes a dropdown list that permits you to sort by most owned items. Below is the list of most relevant filters to add directly inside the link to make it faster.

The standard result link:

The type parameter

By default it is set at all, meaning it searches for everything, from albums to singles, masters to releases, artists to labels. You can set it at type=release for specific versions of records or type=master for merged versions of a record.

The country parameter

It is possible to add &country= to the link. Then, you can add US, UK, Europe, Germany, Brazil, or whatever. For example, you may be interested in getting the most collected releases from Yugoslavia. The link, and the results:

NB: “Les plus collectionnés” means “The most collected”

We hardly ever knew a thing about Yugoslavian market and in a second, we get that insightful list. Thriller had shipped 112,000 units there by June 1984 as per Billboard. All its Yugoslavian releases add for 479 owners on Discogs. Dire Straits’ Brothers in Arms is more in line with tastes of collectors, but its 644 owners still suggest it topped the 6 digits mark in that market which is absolutely huge.

I only listed a few albums but as displayed the ranking includes over 45,000 versions of records. Also, I randomly displayed results for Yugoslavia, but the same can be listed for virtually every country which ever produced records. I’m sure you are now starting to see how truly insanely good this tool is.

The format parameter

Since formats are collected on different proportions, it makes sense to filter by them using the &format= string plus the format name. You can enter almost anything on it:

The year parameter

Very easy filter, you only need to add &year= with the year. Below are most owned albums (masters) from 1966 with no surprise at all:

The label parameter

Now we get something truly unexpected. Filtering on a label may sound useless. What if that label is RCA Music Service, the ancestor of the BMG Music Club? You barely need to add &label=RCA-music-service and see:

Obviously, we need to ignore Sisters Of Mercy who only happen to have all three words (RCA, Music, Service) on their label name. Apart from that technicality, we see the most owned versions from the club about which we knew nothing until now. Prince‘s Purple Rain emerges at the top, followed by the Greatest Hits of the Cars.

All albums listed minus High ‘N’ Dry gained multiple platinum certifications from the RIAA when all club sales were allowed in 1994. Interestingly, the Def Leppard album hasn’t been audited since 1992, meaning it is undoubtedly undercertified for very long. The same lists can be displayed for BMG Direct Marketing (the manufacturer of the club) and Columbia House. They are respectively led by Dirty Dancing and Phil CollinsBut Seriously.

The artist parameter

Of course, you may also be interested in looking for stats of a specific artist. You can get into very detailed searches. Below is the most owned Madonna‘s LPs in the Philippines.

Like A Virgin leads the way with 40 owners against 39 for True Blue. You can go deeper and wonder what it means to get this number of owners from the Philippines. To do that, you only need to remove the filter Madonna and add year=1984 and then year=1986. Both albums are 3rd of their respective year. That being said, the soundtrack Footloose leads 1984 with a monstrous 668 owners, which speaks volume about its sales there. As a comparison, the count of Thriller from the Philippines stands on 122.


Estimates are always very tricky and filling in unknowns is undoubtedly the hardest part. It’s already great to have indirect sales indicators like the market size, performance patterns of the artist and popularity of each song per country. To get direct figures like the ones on Discogs, proportional to real sales, is an unexpected gift. Even more when it concerns obscure releases. Their statistics provide us countless of incredible data. The following article will present concrete examples of its usages from the discography of Elvis Presley. On his case alone, it solves estimation holes for several hundreds of albums. Stay tuned!

5 1 vote
Article Rating
Notify of

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Oldest Most Voted
Inline Feedbacks
View all comments