167 points by matt_d 125 days ago
Guesstimate  (cited in this paper in footnote 6 and previously discussed on HN ) is a really nice implementation of some of these ideas.
Has anyone come across other good implementations?
As a side note, I've been doing more probabilistic programming with pymc3 recently, and it's pretty incredible how leaky the abstractions can be. I'm not saying there's a way to do better, just that at present there's a huge gap between the beautiful vision of "the inference button" and the current tools.
We're working on something like this: https://causal.app :)
Guesstimate is awesome, but their team sadly stopped working on it a while back. It's definitely early days for automated inference, but I think giving people the tools to build "static" (non-learning) models that can account for uncertainty is hugely valuable in itself. You need serious gymnastics to do this in spreadsheets right now, and I wouldn't wish Excel's probabilistic plugins (Palisade @RISK, Oracle Crystal Ball) on anyone.
But progress towards the "inference button" dream is starting to accelerate:
- Tensorflow recently got its own PPL 
- The first international conference on probabilistic programming was held (PROBPROG 2018) 
- Lots of PPL development going on in tech companies: Uber, FB, Google, Microsoft, Stripe, Improbable, etc.
Microsoft recently released a probabilistic programming library for .NET named Infer.NET 
They recently open sourced it. It was previewed in lectures from Christopher Bishop more than 10 years ago.
>As a side note, I've been doing more probabilistic programming with pymc3 recently, and it's pretty incredible how leaky the abstractions can be. I'm not saying there's a way to do better, just that at present there's a huge gap between the beautiful vision of "the inference button" and the current tools.
Yeah, there's a pretty active debate in the probabilistic programming R&D community over whether it's a bug to sell people on "the inference button", then deliver a leaky abstraction, or a feature to offer richly programmable inference. Our lab has been working on some ideas to get "basic" and "advanced" inference techniques and generative models to compose together nicely to try and build a bridge between the two options
 https://drive.google.com/file/d/1bv8g7KTgpgRLsx3ZcaPzIlhGzSa..., https://arxiv.org/abs/1811.05965
Paper's abstract looks cool, I'd love to take a look at this! Rather than selling "the inference button", I always wondered why more probabilistic frameworks don't play up the "programming" portion of it. Most of us who look into probabilistic programming aren't that afraid of getting our hands wet.
Hopefully we should have the actual framework open-sourced and a longer-form paper on arxiv (and under review) within a few months. This has been over a year in development by now, but we're getting there.
Moving from point estimates to distributions is great progress.
It reminds me of this great paper that highlights how much information we're losing when we're only looking at means or assume everything is normally distributed. https://arxiv.org/pdf/1806.02404.pdf
Abstract link: Sandberg, Drexler, and Ord - Dissolving the Fermi paradox (https://arxiv.org/abs/1806.02404).
It's nowhere as sophisticated as this, but I wrote a little utility for Monte Carlo analysis of spreadsheet models with Python and Xlwing:
As a bonus, since the spreadsheet model is exposed as a Python function, emulating complex spreadsheets with simple ML models (decision trees...) is easy.
Nice. The Excel/Python combo is very powerful. Microsoft has kept Excel in the dark ages, and could have implemented such better analytics than it currently offers. If they could integrate Python in lieu of VBA and make it compatible with their Form designer, it would become very popular. The ability to quickly knock-up desktop interfaces, closely integrated with powerful analytics and the user’s data source is priceless in certain environments (e.g. capital markets).
Interesting but hardly novel. In certain fields, ‘end users’ have routinely employed probabilistic tools via Excel or other spreadsheets for decades. Wider adoption has been limited due to the knowledge base that is required to either (a) confidently build such models, (b) communicate probabilistic results to stakeholders.
@Risk and CrystalBall were some of the earlier Excel add-ins which simplified simulation-based spreadsheet development.
As someone else mentioned, the Excel/Python combination is really powerful, although lower-level. DataNitro comes to mind, as well as a product by Resolver Systems (?) which was essentially an IronPython powered spreadsheet interface.
Dead link for some reason: https://web.archive.org/web/20190619190008/https://www.cs.uo...
Let's say i want to predict an output `C` by multiplying two distributions `A``B` = `C`.
Assuming I am just guessing at the distribution of `A` and `B` (Uniform? Bernoulli? Geometric? Log-Normal?), would I get a better estimate by just multiplying `mean(A)` `mean(B)` ?
Point values suck. However, predicting the mean is often possible/realistic. And I feel like I am taking wild guess when describing a distribution of a data set to be honest.
What results in better prediction/guestimate? multiplying incorrect probability distributions? Or multiplying more-correct means/point values?
I don't have a good answer. But I wonder if there are some realistic situations where we would have a good guess at the mean, but no clue about the distribution.