End-User Probabilistic Programming [pdf] (cs.uoregon.edu)

167 points by matt_d 27 days ago

15 comments

exp1orer 26 days ago

Guesstimate [1] (cited in this paper in footnote 6 and previously discussed on HN [2]) is a really nice implementation of some of these ideas.

Has anyone come across other good implementations?

As a side note, I've been doing more probabilistic programming with pymc3 recently, and it's pretty incredible how leaky the abstractions can be. I'm not saying there's a way to do better, just that at present there's a huge gap between the beautiful vision of "the inference button" and the current tools.

[1] https://www.getguesstimate.com/

[2] https://news.ycombinator.com/item?id=18785371

    refrigerator 26 days ago

    We're working on something like this: https://causal.app :)

    Guesstimate is awesome, but their team sadly stopped working on it a while back. It's definitely early days for automated inference, but I think giving people the tools to build "static" (non-learning) models that can account for uncertainty is hugely valuable in itself. You need serious gymnastics to do this in spreadsheets right now, and I wouldn't wish Excel's probabilistic plugins (Palisade @RISK, Oracle Crystal Ball) on anyone.

    But progress towards the "inference button" dream is starting to accelerate:

    - Tensorflow recently got its own PPL [1]

    - The first international conference on probabilistic programming was held (PROBPROG 2018) [2]

    - Lots of PPL development going on in tech companies: Uber, FB, Google, Microsoft, Stripe, Improbable, etc.

    [1]: https://www.tensorflow.org/probability

    [2]: https://probprog.cc/

    eli_gottlieb 26 days ago

    >As a side note, I've been doing more probabilistic programming with pymc3 recently, and it's pretty incredible how leaky the abstractions can be. I'm not saying there's a way to do better, just that at present there's a huge gap between the beautiful vision of "the inference button" and the current tools.

    Yeah, there's a pretty active debate in the probabilistic programming R&D community over whether it's a bug to sell people on "the inference button", then deliver a leaky abstraction, or a feature to offer richly programmable inference. Our lab has been working on some ideas to get "basic" and "advanced" inference techniques and generative models to compose together nicely to try and build a bridge between the two options[1]

    [1] https://drive.google.com/file/d/1bv8g7KTgpgRLsx3ZcaPzIlhGzSa..., https://arxiv.org/abs/1811.05965

      Karrot_Kream 26 days ago

      Paper's abstract looks cool, I'd love to take a look at this! Rather than selling "the inference button", I always wondered why more probabilistic frameworks don't play up the "programming" portion of it. Most of us who look into probabilistic programming aren't that afraid of getting our hands wet.

        eli_gottlieb 26 days ago

        Hopefully we should have the actual framework open-sourced and a longer-form paper on arxiv (and under review) within a few months. This has been over a year in development by now, but we're getting there.

thanatropism 26 days ago

It's nowhere as sophisticated as this, but I wrote a little utility for Monte Carlo analysis of spreadsheet models with Python and Xlwing:

https://github.com/asemic-horizon/stanton

As a bonus, since the spreadsheet model is exposed as a Python function, emulating complex spreadsheets with simple ML models (decision trees...) is easy.

    floki999 26 days ago

    Nice. The Excel/Python combo is very powerful. Microsoft has kept Excel in the dark ages, and could have implemented such better analytics than it currently offers. If they could integrate Python in lieu of VBA and make it compatible with their Form designer, it would become very popular. The ability to quickly knock-up desktop interfaces, closely integrated with powerful analytics and the user’s data source is priceless in certain environments (e.g. capital markets).

floki999 26 days ago

Interesting but hardly novel. In certain fields, ‘end users’ have routinely employed probabilistic tools via Excel or other spreadsheets for decades. Wider adoption has been limited due to the knowledge base that is required to either (a) confidently build such models, (b) communicate probabilistic results to stakeholders.

@Risk and CrystalBall were some of the earlier Excel add-ins which simplified simulation-based spreadsheet development.

As someone else mentioned, the Excel/Python combination is really powerful, although lower-level. DataNitro comes to mind, as well as a product by Resolver Systems (?) which was essentially an IronPython powered spreadsheet interface.

axpence 26 days ago

Let's say i want to predict an output `C` by multiplying two distributions `A``B` = `C`.

Assuming I am just guessing at the distribution of `A` and `B` (Uniform? Bernoulli? Geometric? Log-Normal?), would I get a better estimate by just multiplying `mean(A)` `mean(B)` ?

Point values suck. However, predicting the mean is often possible/realistic. And I feel like I am taking wild guess when describing a distribution of a data set to be honest.

TLDR: What results in better prediction/guestimate? multiplying incorrect probability distributions? Or multiplying more-correct means/point values?

    jmmcd 25 days ago

    I don't have a good answer. But I wonder if there are some realistic situations where we would have a good guess at the mean, but no clue about the distribution.