Thursday, August 22, 2019

Preindustrial workers worked fewer hours than today's

Work brings purpose to life but it can be a great hack to keep you occupied from serious business. Work is not equivalent to production and it is not bliss. It is necessary to keep one alive and can bring happiness and purpose but it needs to be on itself purposeful and with actual tangible results.

There is an overworking and underproduction crisis. People work in bullshit works shuffling paper around and they are killing themselves while at the same time they produce nothing.

To unburden ourselves from normalcy bias we have to look at different times where things were simpler and their wasn't much space for non-producing workers.

From the paper "Preindustrial workers worked fewer hours than today's"

"The contrast between capitalist and precapitalist work patterns is most striking in respect to the working year. The medieval calendar was filled with holidays. Official -- that is, church -- holidays included not only long "vacations" at Christmas, Easter, and midsummer but also numerous saints days. These were spent both in sober churchgoing and in feasting, drinking and merrymaking. In addition to official celebrations, there were often week's worth of ales -- to mark important life events (bride ales or wake ales) as well as less momentous occasions (scot ale, lamb ale, and hock ale). All told, holiday leisure time in medieval England took up probably about one-third of the year. And the English were apparently working harder than their neighbors. The ancient regime in France is reported to have guaranteed fifty-two Sundays, ninety rest days, and thirty-eight holidays. In Spain, travelers noted that holidays totaled five months per year."

From the article " On the Phenomenon of Bullshit Jobs: A Work Rant by David Graeber "

In the year 1930, John Maynard Keynes predicted that, by century's end, technology would have advanced  sufficiently that countries like Great Britain or the United States would have achieved a 15-hour work week. There's every reason to believe he was right. In technological terms, we are quite capable of this. And yet it didn't happen. Instead, technology has been marshaled, if anything, to figure out ways to make us all work more. In order to achieve this, jobs have had to be created that are, effectively, pointless. Huge swathes of people, in Europe and North America in particular, spend their entire working lives performing tasks they secretly believe do not really need to be performed. The moral and spiritual damage that comes from this situation is profound. It is a scar across our collective soul. Yet virtually no one talks about it.

Why did Keynes' promised utopia—still being eagerly awaited in the '60s—never materialise? The standard line today is that he didn't figure in the massive increase in consumerism. Given the choice between less hours and more toys and pleasures, we've collectively chosen the latter. This presents a nice morality tale, but even a moment's reflection shows it can't really be true. Yes, we have witnessed the creation of an endless variety of new jobs and industries since the '20s, but very few have anything to do with the production and distribution of sushi, iPhones, or fancy sneakers.

In Bullshit Jobs, American anthropologist David Graeber posits that the productivity benefits of automation have not led to a 15-hour workweek, but instead to "bullshit jobs": "a form of paid employment that is so completely pointless, unnecessary, or pernicious that even the employee cannot justify its existence even though, as part of the conditions of employment, the employee feels obliged to pretend that this is not the case."

The author contends that more than half of societal work is pointless, both large parts of some jobs and, as he describes, five types of entirely pointless jobs:
  1. flunkies, who serve to make their superiors feel important, e.g., receptionists, administrative assistants, door attendants
  2. goons, who act aggressively on behalf of their employers, e.g., lobbyists, corporate lawyers, telemarketers, public relations specialists
  3. duct tapers, who ameliorate preventable problems, e.g., programmers repairing shoddy code, airline desk staff who calm passengers whose bags don't arrive
  4. box tickers, who use paperwork or gestures as a proxy for action, e.g., performance managers, in-house magazine journalists, leisure coordinators
  5. taskmasters, who manage—or create extra work for—those who don't need it, e.g., middle management, leadership professionals
 From the book "Bullshit Jobs"

Wednesday, August 21, 2019

Identity Mappings in Deep Residual Networks, 2016

A very nice improvement over the original ResNet.

In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation.

Tuesday, August 20, 2019

Companies with overpaid CEOs have markedly underperformed the S&P 500

The companies with overpaid CEOs we identified in our first report have markedly underperformed the S&P 500.

Two years ago, we analyzed how these firms’ stock price performed since we originally identified their CEOs as overpaid. We found then that the 10 companies we identified as having the most overpaid CEOs, in aggregate, underperformed the S&P 500 index by an incredible 10.5 percentage points and actually destroyed shareholder value, with a negative 5.7 percent financial return. The trend continues to hold true as we measure performance to year-end 2018. Last year, these 10 firms again, in aggregate, dramatically underperformed the S&P 500 index, this time by an embarrassing 15.6 percentage points.

In analyzing almost 4 years of returns for these 10 companies we find that they lag the S&P 500 by 14.3 percentage points, posting an overall loss in value of over 11 percent.

Consistent with our 2018 report, this year we used a two-ranking methodology to identify overpaid CEOs.
1. The first is the same HIP Investor regression we’ve used every year that computes excess CEO pay assuming such pay is related to total shareholder return (TSR). 
2. The second ranking identified the companies where the most shares were voted against the CEO pay package. 

These two rankings were weighted 2:1, with the regression analysis being the majority. We then excluded those CEOs whose total disclosed compensation (TDC) was in the lowest third of all the S&P 500 CEO pay packages. The full list of the 100 most overpaid CEOs using this methodology is found in Appendix A. The regression analysis of predicted and excess pay performed by HIP Investor is found in Appendix C, and its methodology is more fully explained there.

From the article
https://www.asyousow.org/report/the-100-most-overpaid-ceos-2019

 

Machine Learning Engineering : Tests for Infrastructure

An ML system often relies on a complex pipeline rather than a single running binary.

Engineering checklist:

  1. Test the reproducibility of training
  2. Unit test model specification code
  3. Integration test the full ML pipeline
  4. Test model quality before attempting to serve it
  5. Test that a single example or training batch can be sent to the model
  6. Test models via a canary process before they enter production serving environments
  7. Test how quickly and safely a model can be rolled back to a previous serving version

1. Test the reproducibility of training. Train two models on the same data, and observe any differences in aggregate metrics, sliced metrics, or example-by-example predictions. Large differences due to non-determinism can exacerbate debugging and troubleshooting.

2. Unit test model specification code. Although model specifications may seem like “configuration”, such files can have bugs and need to be tested. Useful assertions include testing that training results in decreased loss and that a model can restore from a checkpoint after a mid-training job crash.

3. Integration test the full ML pipeline. A good integration test runs all the way from original data sources, through feature creation, to training, and to serving. An integration test should run both continuously as well as with new releases of models or servers, in order to catch problems well before
they reach production.

4. Test model quality before attempting to serve it. Useful tests include testing against data with known correct outputs and validating the aggregate quality, as well as comparing predictions to a previous version of the model.

5. Test that a single example or training batch can be sent to the model, and changes to internal state can be observed from training through to prediction. Observing internal state on small amounts of data is a useful debugging strategy for issues like numerical instability.

6. Test models via a canary process before they enter production serving environments. Modeling code can change more frequently than serving code, so there is a danger that an older serving system will not be able to serve a model trained from newer code. This includes testing that a model can be loaded into the production serving binaries and perform inference on production input data at all. It also includes a canary process, in which a new version is tested on a small trickle of live data.


7. Test how quickly and safely a model can be rolled back to a previous serving version. A model “roll back” procedure is useful in cases where upstream issues might result in unexpected changes to model quality. Being able to quickly revert to a previous known-good state is as crucial with ML models as with any other aspect of a serving system.

* From  "What’s your ML Test Score? A rubric for ML production systems" NIPS, 2016


Wednesday, August 14, 2019

Machine Learning Engineering : Tests for Model Development

While the field of software engineering has developed a full range of best practices for developing reliable software systems, the set of standards and practices for developing ML models in a rigorous fashion is still developing. It can be all too tempting to rely on a single-number summary metric to judge performance, perhaps masking subtle areas of unreliability. Careful testing is needed to search for potential lurking issues.

Engineering checklist:

  1. Test that every model specification undergoes a code review and is checked in to a repository
  2. Test the relationship between offline proxy metrics and the actual impact metrics
  3. Test the impact of each tunable hyperparameter
  4. Test the effect of model staleness. Concept drift is real for non stationary processes
  5. Test against a simpler model as a baseline
  6. Test model quality on important data slices
  7. Test the model for implicit bias
1. Test that every model specification undergoes a code review and is checked in to a repository.
It can be tempting to avoid, but disciplined code review remains an excellent method for avoiding
silly errors and for enabling more efficient incident response and debugging.

2. Test the relationship between offline proxy metrics and the actual impact metrics. For exam-
ple, how does a one-percent improvement in accuracy or AUC translate into effects on metrics of
user satisfaction, such as click through rates? This can be measured in a small scale A/B experiment
using an intentionally degraded model.

3. Test the impact of each tunable hyperparameter. Methods such as a grid search or a more
sophisticated hyperparameter search strategy not only improve predictive performance, but also
can uncover hidden reliability issues. For example, it can be surprising to observe the impact of
massive increases in data parallelism on model accuracy.

4. Test the effect of model staleness. If predictions are based on a model trained yesterday versus
last week versus last year, what is the impact on the live metrics of interest? All models need to be
updated eventually to account for changes in the external world; a careful assessment is important to
guide such decisions.

5. Test against a simpler model as a baseline. Regularly testing against a very simple baseline
model, such as a linear model with very few features, is an effective strategy both for confirming
the functionality of the larger pipeline and for helping to assess the cost to benefit tradeoffs of more
sophisticated techniques.

6. Test model quality on important data slices. Slicing a data set along certain dimensions of
interest provides fine-grained understanding of model performance. For example, important slices
might be users by country or movies by genre. Examining sliced data avoids having fine-grained
performance issues masked by a global summary metric.

7. Test the model for implicit bias. This may be viewed as an extension of examining important data
slices, and may reveal issues that can be root-caused and addressed. For example, implicit bias might
be induced by a lack of sufficient diversity in the training data.

* From  "What’s your ML Test Score? A rubric for ML production systems" NIPS, 2016




Machine Learning Engineering : Tests for Features and Data

Machine learning systems differ from traditional software-based systems in that the behavior of ML systems is not specified directly in code but is learned from data. Therefore, while traditional software can rely on unit tests and integration tests of the code, here we attempt to add a sufficient set of tests of the data.


Engineering checklist:

  1. Test that the distributions of each feature match your expectations
  2. Test the relationship between each feature and the target 
  3. Test the cost of each feature
  4. Test that a model does not contain any unsuitable for use feature
  5. Test that your system maintains privacy controls across its entire data pipeline
  6. Test all code that creates input features

1. Test that the distributions of each feature match your expectations. One example might be to test that Feature A takes on values 1 to 5, or that the two most common values of Feature B are "Harry" and "Potter" and they account for 10% of all values. This test can fail due to real external changes, which may require changes in your model.

2. Test the relationship between each feature and the target, and the pairwise correlations between individual signals. It is important to have a thorough understanding of the individual features used in a given model; this is a minimal set of tests, more exploration may be needed to develop a full understanding. These tests may be run by computing correlation coefficients, by training models with one or two features, or by training a set of models that each have one of k features individually removed.

3. Test the cost of each feature. The costs of a feature may include added inference latency and RAM usage, more upstream data dependencies, and additional expected instability incurred by relying on that feature. Consider whether this cost is worth paying when traded off against the provided improvement in model quality.

4. Test that a model does not contain any features that have been manually determined as unsuitable for use. A feature might be unsuitable when it’s been discovered to be unreliable, overly expensive, etc. Tests are needed to ensure that such features are not accidentally included (e.g. via copy-paste) into new models.

5. Test that your system maintains privacy controls across its entire data pipeline. While strict access control is typically maintained on raw data, ML systems often export and transform that data during training. Test to ensure that access control is appropriately restricted across the entire pipeline. Test the calendar time needed to develop and add a new feature to the production model. The faster a team can go from a feature idea to it running in production, the faster it can both improve the system and respond to external changes.

6. Test all code that creates input features, both in training and serving. It can be tempting to believe feature creation code is simple enough to not need unit tests, but this code is crucial for correct behavior and so its continued quality is vital.

* From  "What’s your ML Test Score? A rubric for ML production systems" NIPS, 2016



Tuesday, August 6, 2019

What is deep learning ?

Long Story Short

I like this definition, clean, consise and to the point with zero marketing fluff.
“A class of parametrized non-linear representations encoding appropriate domain knowledge (invariance and stationarity) that can be (massively) optimized efficiently using stochastic gradient descent”

What neural networks actually do ?