Forum
Ok, I think I get it now. It is very much a case of trial and error which leads to a better understanding of the data as a fluid whole rather than working on individual items. As the information builds up it becomes more intuitive, almost like AI in some kind of way?
I’m always keen to learn more. You say you have complete ratio sets for lots of formats and then explain how a cassette can be worth roughly 10 times more than the vinyl. In Sinatra’s prime it was of course mainly cartridges in the US, so you have a ratio for that too then? How does it work exactly? If you have a total of, say, 14 cassette/cartridge owners on Discogs you kind of make that the equal of 140 vinyl?
As you say, that is an ‘empirical approach’ based on thousands of observations and is, I presume, of secondary use if you have better data like good and trusted chart runs and/or awards to use first and foremost? One of the standout compilation breakdowns you have presented was of the 1966 ‘Live At The Sands’ album, but surely your US total is in error at 1,545,000 when, in your chart run attachment example you gave a few posts ago you have the release calculated at 368,235? Something is wrong here, isn’t it, or is the seeming discrepancy explained by the Discogs data ‘presence’ as you describe where more obscure releases better calibrate the result over the many years that follow?
Fascinating as always.
You are spot on with all your questions! It's indeed some kind of machine learning, 14 cassettes/cartridge would (by default) be worth in sales in the real world about as many as 140 LP owners, and correct too about the fact that we first use primary data before filling in the holes with these indicators and patterns from Discogs!
I do not have all the details in mind anymore, but I do remember being impressed at how good catalog sales of Live At The Sands were. At times depending on how a discography is built, live albums work as compilations (an obvious example used to be AC/DC's Live), and it seems LATS somewhat got that role as well. Its run do reflect 368k sales, although as I said Sinatra's sales were underrated by Billboard's charts at that point, which can also be verified by this release going Gold in late 1967.
The run / cert suggest half a million shipped (and virtually sold) by the start of 1968. Then it's when Discogs starts helping. The initial versions combine for 4,173 owners, while vinyl versions issued from 1968 onwards combine for 1,711, so we are looking at a bit more than 700,000 sales in the vinyl era. Its CD version was released before Soundscan, in 1986. It happens that thanks to several leaks of data through the years, we have a very good estimate of its Soundscan-era sales: 660,000 copies sold since 1991. Its pace of sales during the 90s (20k/year) also enable to compare Discogs' indicators to estimate its sales from 1986 to 1990 - in that case 175,000 units.
Add to that a tiny extra from club sales (0.3%, as shown by Discogs too), and we get 705 (LP era sales) + 175 (CD 86-90) + 660 (Soundscan) + 5 (Clubs) = 1,545,000 copies sold in the US. As a pre-1982 double LP, it is eligible to 3xPlatinum.
This is actually a good example at how totals are constructed, adding every piece of information together to be as comprehensive and accurate as we can!
I follow the totals construction process up until the CD era Guillaume. You say 660k is the Soundscan era total from 1991 then refer to it being 20k a year on average in the 90s, which doesn't compute! I could see how you get the Discogs ownership data to provide an estimate for the vinyl era after 1968, but try as I might I can't get the CD ownership numbers to fit in the same way.
I found various reasonable-sized ownership entries for the CD of 535, 162, and some undated ones of 92 and 267, which total 1,056, and then there are 72 owners for the 1986 CD pre-Soundscan era release and 34 for the 1990 MC (which x10 = 340 perhaps?) so maybe 412, but that doesn't equate to 175,000 units from 1986-1991 - if anything it is more!
Sorry to be so dumb, but can you just walk me through this stage again? Especially as it is a good example of pulling all the information together. It is definitely the sort of thing you should explain in one of your useful decoding articles some time too!
I didn't provide all the details for the CD sales indeed, and yes the cassette release also impacts although I tried to keep it simple! There are two sets of data that leaked for this album - 273k scanned up to the end of 2005 (from 1991, hence the roughly 20k per year), and 571k by March 2013. It was part of multiple sales at iTunes, where it did very well (it sold 16,451 on that leaked week in March 2013). The 20k/year is an average, in truth the album was selling likely a bit less in 1991-1997, and got boosted as the rest of his catalog by his passing and the '98 reissues. That's in part why it seems sales accelerated later on. That being said it must have been stable through the years as it did not chart in the catalog list in 1998, only many other albums, and to sell 20k/year without ever making that list an album has to be very consistent.
For CDs the year is often missing (they weren't printed in general), for Sinatra you've got mainly two versions: the one from the 80s and the 1998 reissue Entertainer of the Century. Combining others from these releases, they have respectively 340 and 702 owners. A key 'detail' is that for CDs older they are the least collected they are, as early adopters of this format already moved back to vinyls. We can see it with this album too as the sellers are indeed much closer (23 to 34).
Then we enter guess work territory. Discogs' data is good enough to suggest the album's versions from the 80s sold well. To give some perspective, Duets (arguably less collected as more grand public) has about 2,000 CD owners from the US. I do not remember my exact reasoning back then (I do not have thiner details listed in the file), but it should have been something like 50k-25k-20k-15k-15k from 86 to 90, and 50k added for cassettes from that era.
I ignore the 32-owners CD version from that piece as it wasn't scanned - it is the BMG Direct club version.